Hello eBPF: Generating C Code (8)

Posted on April 9, 2024 by Johannes Bechberger

Welcome back to my series on ebpf. In the last blog post, we learned how to auto-layout struct members and auto-generate BPFStructTypes for annotated Java records. We’re going to extend this work today.

This is a rather short blog post, but the implementation and fixing all the bugs took far more time then expected.

Generating Struct Definitions

We saw in the last blog post how powerful Java annotation processing is for generating Java code; this week, we’ll tackle the generation of C code: In the previous blog post, we still had to write the C struct and map definitions ourselves, but writing

struct event {
  u32 e_pid;
  char e_filename[FILE_NAME_LEN];
  char e_comm[TASK_COMM_LEN];
};

when we already specified the data type properly in Java

record Event(@Unsigned int pid,
             @Size(FILE_NAME_LEN) String filename,
             @Size(TASK_COMM_LEN) String comm) {}

seems to be a great place to improve our annotation processor. There are only two problems:

The annotation processor needs to know about BPFTypes, so we have to move them in there. But the BPFTypes use the Panama API which requires the –enable-preview flag in JDK 21, making it unusable in Java 21. So we have to move the whole library over to JDK 22, as this version includes Panama.
There is no C code generation library like JavaPoet for generating Java code.

Regarding the first problem: Moving to JDK 22 is quite easy, the only changes I had to make are listed in this gist. The only major problem was getting the Lima VM to use a current JDK 22. In the end I resorted to just using sdkman, you can a look into the install.sh script to see how I did it.

Regarding the second problem: We can reduce the problem of generating C code into two steps:

Create an Abstract Syntax Tree (AST) for C
Create a pretty printer for this AST

To create an AST I resorted to an ANSI C grammar for inspiration. Each AST node implements the following interface:

public interface CAST {

    List<? extends CAST> children();

    Statement toStatement();

    /** Generate pretty printed code */
    default String toPrettyString() {
        return toPrettyString("", "  ");
    }

    String toPrettyString(String indent, String increment);
}

We can then create a hierarchy of extending interfaces (PrimaryExpression, …) and implementing records (ConstantExpression, …). You can find the whole C AST on GitHub.

This leads us to an annotation processor that can add automatically insert struct definitions into the C code of our eBPF program, reducing the amount of hard-to-debug errors as it is guaranteed that both the Java specification and C representation of every type are compatible.

But can we do more with annotation processing?

Generating Map Definitions

There is another definition that we can auto-generate: Map definitions like

 struct                                
 {                                     
   __uint (type, BPF_MAP_TYPE_RINGBUF);
   __uint (max_entries, 256 * 4096);   
 } rb SEC (".maps");

which define maps like hash maps and ring buffers that allow the communication between user- and kernel-space.

With a little of annotation processor, we can define the same ring buffer from above in Java:

@BPFMapDefinition(maxEntries = 256 * 4096)
BPFRingBuffer<Event> rb;

Our annotation-processor then turns this into the C definition from above and inserts code into the constructor of the Java program that properly initializes rb.

But how does the processor know what code it should generate? By parsing the BPFMapClass annotation on BPFRingBuffer (and any other class). This annotation contains the templates for both the C and the Java code:

@BPFMapClass(
        cTemplate = """
        struct {
            __uint (type, BPF_MAP_TYPE_RINGBUF);
            __uint (max_entries, $maxEntries);
        } $field SEC(".maps");
        """,
        javaTemplate = """
        new $class<>($fd, $b1)
        """)
public class BPFRingBuffer<E> extends BPFMap {
}

Here $field is the Java field name, $maxEntries the value in the BPFMapDefinition annotation and $class the name of the Java class. $cX, $bX, $jX give the C type name, BPFType and Java class names related to the X^th type parameter.

Ring Buffer Sample Program

When we combine all this together we can have a much simpler ring buffer sample program (see TypeProcessingSample2 on GitHub):

@BPF(license = "GPL")
public abstract class TypeProcessingSample2 extends BPFProgram {

    private static final int FILE_NAME_LEN = 256;
    private static final int TASK_COMM_LEN = 16;

    @Type(name = "event")
    record Event(
      @Unsigned int pid, 
      @Size(FILE_NAME_LEN) String filename, 
      @Size(TASK_COMM_LEN) String comm) {}

    @BPFMapDefinition(maxEntries = 256 * 4096)
    BPFRingBuffer<Event> rb;

    static final String EBPF_PROGRAM = """
            #include "vmlinux.h"
            #include <bpf/bpf_helpers.h>
            #include <bpf/bpf_tracing.h>
            #include <string.h>
              
            // This is where the struct and map
            // definitions are inserted automatically          
                  
            SEC ("kprobe/do_sys_openat2")
            int kprobe__do_sys_openat2 (struct pt_regs *ctx)
            {
               // ... // as before
            }
            """;

    public static void main(String[] args) {
        try (TypeProcessingSample2 program = 
           BPFProgram.load(TypeProcessingSample2.class)) {
            program.autoAttachProgram(
              program.getProgramByName("kprobe__do_sys_openat2"));
            // we can use the rb ring buffer directly
            // but have to set the call back
            program.rb.setCallback((buffer, event) -> {
                System.out.printf(
                  "do_sys_openat2 called by:%s " + 
                  "file:%s pid:%d\n", 
                  event.comm(), event.filename(), 
                  event.pid());
            });
            while (true) {
                // consumes all registered ring buffers
                program.consumeAndThrow();
            }
        }
    }
}

There are two other things missing in the C code that are also auto-generated: Constant defining macros and the license definition. Macros are generated for all static final fields in the program class that are defined at compile time.

Conclusion

Using annotation processing allows to reduce the amount of C code we have to write and reduces errors by generating all definitions from the Java code. This simplifies writing eBPF applications.

See you in two weeks when we tackle global variables, moving closer and closer to making hello-ebpf’s bpf support able to write a small firewall.

This will also be the topic of a talk that I submitted together with Mohammed Aboullaite to several conferences for autumn.

Addendum

The more I work on writing my own ebpf library, the more I value the effort that the developers of other libraries like bcc, the Go or Rust ebpf libraries put it in to create usable libraries. They do this despite the lack of of proper documentation. A simple example is the deattaching of attached ebpf programs: There are multiple (undocumented) methods in libbpf that might be suitable; bpf_program__unload, bpf_link__detach, bpf_link__destroy, bpf_prog_detach, but only bpf_link__destroy properly detached a program.

This article is part of my work in the SapMachine team at SAP, making profiling and debugging easier for everyone.

Hello eBPF: Auto Layouting Structs (7)

Posted on March 25, 2024 by Johannes Bechberger

Welcome back to my series on ebpf. In the last blog post, we learned how to use ring buffers with libbpf for efficient communication. This week, we’re looking into the memory layout and alignment of structs transferred between the kernel and user-land.

Alignment is essential; it specifies how the compiler layouts the structs and variables and where to put the data in memory. Take, for example, the struct that we defined in the previous blog post in the RingSample:

#define FILE_NAME_LEN 256
#define TASK_COMM_LEN  16
                
// Structure to store the data that we want to pass to user
struct event {
  u32 e_pid;
  char e_filename[FILE_NAME_LEN];
  char e_comm[TASK_COMM_LEN];
};

Struct Example

Using Pahole in the Compiler Explorer, we can see the memory layout on amd64:

struct event {
	unsigned int               e_pid;                /*     0     4 */
	char                       e_filename[256];      /*     4   256 */
	/* --- cacheline 4 boundary (256 bytes) was 4 bytes ago --- */
	char                       e_comm[16];           /*   260    16 */

	/* size: 276, cachelines: 5, members: 3 */
	/* last cacheline: 20 bytes */
};

This means that the know also knows how to transform member accesses to this struct and can adequately place the event in the allocated memory:

You’ve actually seen the layouting information before, as the hello-ebpf project requires you to hand layout all structs manually:

record Event(@Unsigned int pid,
             @Size(FILE_NAME_LEN) String filename,
             @Size(TASK_COMM_LEN) String comm) {}

// define the event records layout
private static final BPFStructType<Event> eventType =
        new BPFStructType<>("rb", List.of(
        new BPFStructMember<>("e_pid",
                BPFIntType.UINT32, 0, Event::pid),
        new BPFStructMember<>("e_filename",
                new StringType(FILE_NAME_LEN),
                4, Event::filename),
        new BPFStructMember<>("e_comm",
                new StringType(TASK_COMM_LEN),
                4 + FILE_NAME_LEN, Event::comm)
   ), new AnnotatedClass(Event.class, List.of()),
   fields -> new Event((int)fields.get(0),
       (String)fields.get(1), (String)fields.get(2)));

eBPF is agnostic regarding alignment, as the compiler on your system compiles the eBPF and the C code, so the compiler can decide how to align everything.

Alignment Rules

But where do these alignment rules come from? They come from how your CPU works. Your CPU usually only allows/is optimized for certain types of accesses. So, for example, x86 CPUs are optimized for accessing 32-bit integers that lay at addresses in memory that are a multiple of four. The rules are defined in the Application Binary Interface (ABI). The alignment rules for x86 (64-bit) on Linux are specified in the System V ABI Specification:

And more, but in general, scalar types are aligned by their size. Structs, unions, and arrays are, on the other hand, aligned based on their members:

Structures and unions assume the alignment of their most strictly aligned component. Each member is assigned to the lowest available offset with the appropriate alignment. The size of any object is always a multiple of the object‘s alignment.

An array uses the same alignment as its elements, except that a local or global array variable of length at least 16 bytes or a C99 variable-length array variable always has alignment of at least 16 bytes.

Structure and union objects can require padding to meet size and alignment constraints. The contents of any padding is undefined.
System V Application Binary Interface
AMD64 Architecture Processor Supplement
Draft Version 0.99.6

ARM 64-but has the same scalar alignments and struct alignment rules (see Procedure Call Standard for the Arm® 64-bit Architecture (AArch64)); we can therefore use the same layouting algorithm for both CPU architectures.

We can formulate the algorithm for structs as follows:

struct_alignment = 1
current_position = 0
for member in struct:
  # compute the position of the member
  # that is properly aligned
  # this introduces padding (empty space between members)
  # if there are alignment issues
  current_position = \
    math.ceil(current_position / alignment) * member.alignment
  member.position = current_position
  # the next position has to be after the current member
  current_position += member.size
  # the struct alignment is the maximum of all alignments
  struct_alignment = max(struct_alignment, member.alignment)

With this at hand, we can look at a slightly more complex example:

Struct Example with Padding

The compiler, at times, has to create an unused memory section between two members to satisfy the individual alignments. This can be seen in the following example:

struct padded_event {
  char c;  // single byte char, alignment of 1
  long l;  // alignment of 8
  int i;   // alignment of 4
  void* x; // alignment of 8
};

Using Pahole again in the Compiler Explorer, we see the layout that the compiler generates:

struct padded_event {
	char                       c;                    /*     0     1 */

	/* XXX 7 bytes hole, try to pack */

	long                       l;                    /*     8     8 */
	int                        i;                    /*    16     4 */

	/* XXX 4 bytes hole, try to pack */

	void *                     x;                    /*    24     8 */

	/* size: 32, cachelines: 1, members: 4 */
	/* sum members: 21, holes: 2, sum holes: 11 */
	/* last cacheline: 32 bytes */
};

Pahole tells us that it had to introduce 11 bytes of padding. We can visualize this as follows:

This means that we’re essentially wasting memory. I recommend reading The Lost Art of Structure Packing by Eric S. Raymond to learn more about this. If we really want to save memory, we could reorder the int with the long member, thereby only needing the padding after the char, leading to an object with 24 bytes and only 3 bytes of padding. This is really important when storing many of these structs in arrays, where the wasted memory accumulates.

But what do we do with this knowledge?

Auto-Layouting in hello-ebpf

The record that we defined in Java before contains all the information to auto-generate the BPFStructType for the class; we just need a little bit of annotation processor magic:

@Type
record Event(@Unsigned int pid,
             @Size(FILE_NAME_LEN) String filename,
             @Size(TASK_COMM_LEN) String comm) {}

This record is processed, and out comes the suitable BPFStructType:

We implemented the auto-layouting in the BPFStructType class to reduce the amount of logic in the annotation processor.

This results in a much cleaner RingSample version, named TypeProcessingSample:

@BPF
public abstract class TypeProcessingSample extends BPFProgram {

    static final String EBPF_PROGRAM = """...""";

    private static final int FILE_NAME_LEN = 256;
    private static final int TASK_COMM_LEN = 16;

    @Type
    record Event(@Unsigned int pid, 
                 @Size(FILE_NAME_LEN) String filename, 
                 @Size(TASK_COMM_LEN) String comm) {}


    public static void main(String[] args) {
        try (TypeProcessingSample program = BPFProgram.load(TypeProcessingSample.class)) {
            program.autoAttachProgram(
              program.getProgramByName("kprobe__do_sys_openat2"));

            // get the generated struct type
            var eventType = program.getTypeForClass(Event.class);

            var ringBuffer = program.getRingBufferByName("rb", eventType,
             (buffer, event) -> {
                System.out.printf("do_sys_openat2 called by:%s file:%s pid:%d\n", 
                                  event.comm(), event.filename(), event.pid());
            });
            while (true) {
                ringBuffer.consumeAndThrow();
            }
        }
    }
}

The annotation processor currently supports the following members in records:

integer types (int, long, …), optionally annotated with @Unsigned if unsigned
String types, annotated with @Size to specify the size
Other @Type annotated types in the same scope
@Type.Member annotated member to specify the BPFType directly

You can find the up-to-date list in the documentation for the Type annotation.

Conclusion

We have to model all C types that we use in both eBPF and Java in Java, too; this includes placing the different members of structs in memory and keeping them properly aligned. We saw that the general algorithm behind the layouting is straightforward. This algorithm can be used in the hello-ebpf library with an annotation processor to make writing eBPF applications more concise and less error-prone.

I hope you liked this introduction to struct layouts. See you in two weeks when we start supporting more features of libbpf.

This article is part of my work in the SapMachine team at SAP, making profiling and debugging easier for everyone.

Hello eBPF: Ring buffers in libbpf (6)

Posted on March 12, 2024 by Johannes Bechberger

Welcome back to my blog series on eBPF. Two weeks ago, I got started in using libbpf instead of libbcc. This week, I show you how to use ring buffers, port the code from Ansil H’s blog post eBPF for Linux Admins: Part IX from C to Java, and add tests to the underlying map implementation.

My libbpf-based implementation advances slower than the bcc-based, as I thoroughly test all added functionality and develop a proper Java API, not just a clone.

But first, what are eBPF ring buffers:

Ring buffers

In Hello eBPF: Recording data in event buffers (3), I showed you how to use perf event buffers, which are the predecessor to ring buffers and allow us to communicate between kernel and user-land using events. But perf buffers have problems:

It works great in practice, but due to its per-CPU design it has two major short-comings that prove to be inconvenient in practice: inefficient use of memory and event re-ordering.

To address these issues, starting from Linux 5.8, BPF provides a new BPF data structure (BPF map): BPF ring buffer (ringbuf). It is a multi-producer, single-consumer (MPSC) queue and can be safely shared across multiple CPUs simultaneously.
BPF ring buffer by Andrii Nakryiko

Ring buffers are still circular buffers:

Their usage is similar to the perf event buffers we’ve seen before. The significant difference is that we implemented the perf event buffers using the libbcc-based eBPF code, which made creating a buffer easy:

BPF_PERF_OUTPUT(rb);

Libbcc compiles the C code with macros. With libbpf, we have to write all that ourselves:

// anonymous struct assigned to rb variable
struct
{
  // specify the type, eBPF specific syntax
  __uint (type, BPF_MAP_TYPE_RINGBUF);
  // specify the size of the buffer
  // has to be a multiple of the page size 
  __uint (max_entries, 256 * 4096);
} rb SEC (".maps") /* placed in maps section */;

More on the specific syntax in the mail for the patch specifying it, more in the ebpf-docs.

On the eBPF side in the kernel, ring buffers have several important helper functions that allow their easy use:

bpf_ringbuf_output

long bpf_ringbuf_output(void *ringbuf, void *data, __u64 size, __u64 flags)

Copy the specified number of bytes of data into the ring buffer and send notifications to user-land. This function returns a negative number on error and zero on success.

bpf_ringbuf_reserve

void* bpf_ringbuf_reserve(void *ringbuf, __u64 size, __u64 flags)

Reserve a specified number of bytes in the ring buffer and return a pointer to the start. This lets us write events directly into the ring buffer’s memory (source).

bpf_ringbuf_submit

void *bpf_ringbuf_submit(void *data, __u64 flags)

Submit the reserved ring buffer event (reserved via bpf_ringbuf_reserve).

You might assume that you can build your own bpf_ringbuf_output with just bpf_ringbuf_reserve and bpf_ringbuf_submit and you’re correct. When we look into the actual implementation of bpf_ringbuf_output, we see that it is not that much more:

BPF_CALL_4(bpf_ringbuf_output, struct bpf_map *, map, 
           void *, data, u64, size,
	   u64, flags)
{
  struct bpf_ringbuf_map *rb_map;
  void *rec;
        
  // check flags
  if (unlikely(flags & ~(BPF_RB_NO_WAKEUP | BPF_RB_FORCE_WAKEUP)))
    return -EINVAL;

  // reserve the memory
  rb_map = container_of(map, struct bpf_ringbuf_map, map);
  rec = __bpf_ringbuf_reserve(rb_map->rb, size);
  if (!rec)
    return -EAGAIN;

  // copy the data into the reserved memory
  memcpy(rec, data, size);

  // equivalent to bpf_ringbuf_submit(rec, flags)
  bpf_ringbuf_commit(rec, flags, false /* discard */);
  return 0;
}

bpf_ringbuf_discard

void bpf_ringbuf_discard(void *data, __u64 flags)

Discard the reserved ring buffer event.

bpf_ringbuf_query

__u64 bpf_ringbuf_query(void *ringbuf, __u64 flags)

Query various characteristics of provided ring buffer. What exactly is queries is determined by flags:

BPF_RB_AVAIL_DATA: Amount of data not yet consumed.

BPF_RB_RING_SIZE: The size of ring buffer.

BPF_RB_CONS_POS: Consumer position (can wrap around).

BPF_RB_PROD_POS: Producer(s) position (can wrap around).

Data returned is just a momentary snapshot of actual values and could be inaccurate, so this facility should be used to power heuristics and for reporting, not to make 100% correct calculation.

Return: Requested value, or 0, if flags are not recognized.
bpf-Helpers man-Page

You can find more information in these resources:

eBPF Docs by Dylan Reimerink
official Linux eBPF documentation
bpf-helpers(7) man-page
Linux kernel source code, as you saw above, can give us insights that no documentation can provide us with

Ring Buffer eBPF Example

After I’ve shown you what ring buffers are on the eBPF side, we can look at the eBPF example that writes an event for every openat call, capturing the process id, filename, and process name and comes as an addition from Ansil H’s blog post eBPF for Linux Admins: Part IX:

#include "vmlinux.h"
#include <bpf/bpf_helpers.h>
#include <bpf/bpf_tracing.h>
#include <string.h>
                
#define TARGET_NAME "sample_write"
#define MAX_ENTRIES 10
#define FILE_NAME_LEN 256
#define TASK_COMM_LEN 256
                
// Structure to store the data that we want to pass to user
struct event
{
  u32 e_pid;
  char e_filename[FILE_NAME_LEN];
  char e_comm[TASK_COMM_LEN];
};
                
// eBPF map reference
struct
{
  __uint (type, BPF_MAP_TYPE_RINGBUF);
  __uint (max_entries, 256 * 4096);
} rb SEC (".maps");
                
// The ebpf auto-attach logic needs the SEC
SEC ("kprobe/do_sys_openat2")
     int kprobe__do_sys_openat2(struct pt_regs *ctx)
{
  char filename[256];
  char comm[TASK_COMM_LEN] = { };
  struct event *evt;
  const char fmt_str[] = "do_sys_openat2 called by:%s file:%s pid:%d";
                
  // Reserve the ring-buffer
  evt = bpf_ringbuf_reserve(&rb, sizeof (struct event), 0);
  if (!evt) {
      return 0;
  }
  // Get the PID of the process.
  evt->e_pid = bpf_get_current_pid_tgid();
                
  // Read the filename from the second argument
  // The x86 arch/ABI have first argument 
  // in di and second in si registers (man syscall)
  bpf_probe_read(evt->e_filename, sizeof(filename), 
        (char *) ctx->si);
                
  // Read the current process name
  bpf_get_current_comm(evt->e_comm, sizeof(comm));
            
  bpf_trace_printk(fmt_str, sizeof(fmt_str), evt->e_comm,
        evt->e_filename, evt->e_pid);
  // Also send the same message to the ring-buffer
  bpf_ringbuf_submit(evt, 0);
  return 0;
}
                
char _license[] SEC ("license") = "GPL";

Ring Buffer Java Example

With this in hand, we can implement the RingSample using the newly added functionality in hello-ebpf:

@BPF
public abstract class RingSample extends BPFProgram {

  static final String EBPF_PROGRAM = """
              // ...
            """;

  private static final int FILE_NAME_LEN = 256;
  private static final int TASK_COMM_LEN = 16;
  
  // event record
  record Event(@Unsigned int pid, 
               String filename, 
               @Size(TASK_COMM_LEN) String comm) {}

  // define the event records layout
  private static final BPFStructType<Event> eventType = 
          new BPFStructType<>("rb", List.of(
          new BPFStructMember<>("e_pid", 
                  BPFIntType.UINT32, 0, Event::pid),
          new BPFStructMember<>("e_filename", 
                  new StringType(FILE_NAME_LEN), 
                  4, Event::filename),
          new BPFStructMember<>("e_comm", 
                  new StringType(TASK_COMM_LEN), 
                  4 + FILE_NAME_LEN, Event::comm)
  ), new AnnotatedClass(Event.class, List.of()), 
  fields -> new Event((int)fields.get(0),
          (String)fields.get(1), (String)fields.get(2)));

  public static void main(String[] args) {
    try (RingSample program = BPFProgram.load(RingSample.class)) {
      // attach the kprobe
      program.autoAttachProgram(
              program.getProgramByName("kprobe__do_sys_openat2"));
      // obtain the ringbuffer
      // and write a message every time a new event is obtained
      var ringBuffer = program.getRingBufferByName("rb", eventType, 
              (buffer, event) -> {
        System.out.printf("do_sys_openat2 called by:%s file:%s pid:%d\n", 
                event.comm(), event.filename(), event.pid());
      });
      while (true) {
        // consume and throw any captured
        // Java exception from the event handler
        ringBuffer.consumeAndThrow();
      }
    }
  }
}

You can run the example via ./run_bpf.sh RingSample:

do_sys_openat2 called by:C1 CompilerThre file:/sys/fs/cgroup/user.slice/user-1000.slice/user@1000.service/app.slice/snap.intellij-idea-community.intellij-idea-community-a46a168b-28d0-4bb9-9e15-f3a966353efe.scope/memory.max pid:69817
do_sys_openat2 called by:C1 CompilerThre file:/sys/fs/cgroup/user.slice/user-1000.slice/user@1000.service/app.slice/snap.intellij-idea-community.intellij-idea-community-a46a168b-28d0-4bb9-9e15-f3a966353efe.scope/memory.max pid:69812
do_sys_openat2 called by:java file:/home/i560383/.sdkman/candidates/java/21.0.2-sapmchn/lib/libjimage.so pid:69797

Conclusion

The libbpf part of hello-ebpf keeps evolving. With this blog post, I added support for the first kind of eBPF maps and ring buffers, with a simplified Java API and five unit tests. I’ll most likely work on the libbpf part in the future, as it is far easier to work with than with libbcc.

Thanks for joining me on this journey to create a proper Java API for eBPF. Feel free to try the examples for yourself or even write new ones and join the discussions on GitHub. See you in my next blog post about my journey to Canada or in two weeks for the next installment of this series.

This article is part of my work in the SapMachine team at SAP, making profiling and debugging easier for everyone.

Hello eBPF: First steps with libbpf (5)

Posted on February 26, 2024 by Johannes Bechberger

Welcome back to my blog series on eBPF. Two weeks ago, I showed you how to write your own eBPF application using my hello-ebpf library based on libbcc. This week, I show you why using libbcc is not the best idea and start working with the newer libbpf.

With my current libbcc-based approach, we essentially embed the executed eBPF program into our programs as a string into our applications and compile them on the fly for every run:

public class HelloWorld {
    public static void main(String[] args) {
        try (BPF b = BPF.builder("""
                int kprobe__sys_clone(void *ctx) {
                   bpf_trace_printk("Hello, World!");
                   return 0;
                }
                """).build()) {
            b.trace_print();
        }
    }
}

Problems with Libbcc

Using libbcc and porting the Python wrapper made it easy to start developing a user-land Java library and offers some syntactic sugar, but it has major disadvantages, to quote Andrii Nakryiko:

Clang/LLVM combo is a big library, resulting in big fat binaries that need to be distributed with your application.

Clang/LLVM combo is resource-heavy, so when you are compiling BPF code at start up, you’ll use a significant amount of resources, potentially tipping over a carefully balanced production workfload. And vice versa, on a busy host, compiling a small BPF program might take minutes in some cases.

BPF program testing and development iteration is quite painful as well, as you are going to get even most trivial compilation errors only in run-time, once you recompile and restart your user-space control application. This certainly increases friction and is not helping to iterate fast.

BPF Portability and CO-RE by Andrii Nakryiko

Additionally, the libbcc binaries in the official Ubuntu package repositories are outdated, so we’re accumulating technical debt using them.

BPF-based Library

So what is the alternative? We compile the embedded C code in our application to eBPF bytecode at build time using a custom annotation processor and load the bytecode using libbpf at run-time:

This allows us to create self-contained JARs that will eventually neatly package our eBPF application.

With this new chapter of the hello-ebpf project, I am trying to create a proper Java API that

builds on top of libbpf
isn’t bound to mimic the Python API, thus making it easier to understand for Java developers
is tested with a growing number of tests so that it is safe to use
prefers usability (and a small API) over speed

The annotation processor for this lives in the bpf-processor, and the central part of the library is in the bpf folder. It is in its earliest stages, but you can expect more features and tests in the following months.

HelloWorld Example

Writing programs with libbpf is not too dissimilar to using my libbcc wrapper:

@BPF // annotation to trigger the BPF annotation processor
public abstract class HelloWorld extends BPFProgram {
    
    // eBPF program code that is compiled at build
    // time using clang
    static final String EBPF_PROGRAM = """
            #include "vmlinux.h"
            #include <bpf/bpf_helpers.h>
            #include <bpf/bpf_tracing.h>
                            
            SEC ("kprobe/do_sys_openat2")
            int kprobe__do_sys_openat2(struct pt_regs *ctx){                                                             
                bpf_printk("Hello, World from BPF and more!");
                return 0;
            }
                            
            char _license[] SEC ("license") = "GPL";
            """;

    public static void main(String[] args) {
        // load an instance of the HelloWorld implementation
        try (HelloWorld program = BPFProgram.load(HelloWorld.class)) {
            // attach to the kprobe
            program.autoAttachProgram(
                program.getProgramByName("kprobe__do_sys_openat2"));
            program.tracePrintLoop(f -> 
                String.format("%d: %s: %s", (int)f.ts(), f.task(), f.msg()));
        }
    }
}

Running this class via ./run_bpf.sh HelloWorld will then print the following:

3385: irqbalance: Hello, World from BPF and more!
3385: irqbalance: Hello, World from BPF and more!
3385: irqbalance: Hello, World from BPF and more!
3385: irqbalance: Hello, World from BPF and more!
3385: irqbalance: Hello, World from BPF and more!
3385: irqbalance: Hello, World from BPF and more!
3385: irqbalance: Hello, World from BPF and more!
3385: C2 CompilerThre: Hello, World from BPF and more!

The annotation processor created an implementation of the HelloWorld class, which overrides the getByteCode method:

public final class HelloWorldImpl extends HelloWorld {
    /**
     * Base64 encoded gzipped eBPF byte-code
     */
    private static final String BYTE_CODE = "H4sIAA...n5q6hfQNFV+sgDAAA=";

    @Override
    public byte[] getByteCode() {
        return Util.decodeGzippedBase64(BYTE_CODE);
    }
}

Compiler Errors

But what happens when you make a mistake in your eBPF program, for example, not writing a semicolon after the bpf_printk call? Then, the annotation processor throws an error at build-time and prints the following error message when calling mvn package:

Processing BPFProgram: me.bechberger.ebpf.samples.HelloWorld
Obtaining vmlinux.h header file
Could not compile eBPF program
HelloWorld.java:[19,66]  error: expected ';' after expression
    bpf_printk("Hello, World from BPF and more!")
                                                 ^
                                                 ;
1 error generated.

The annotation processor compiles the eBPF program using Clang and post-processes the error messages to show the location in the Java program. Using libbcc, we only get this error at run-time, which makes finding these issues far harder.

Conclusion

Using libbpf instead of libbcc has many advantages: Smaller, self-contained JARs, better developer support, and a more modern library. The hello-ebpf project will evolve to focus on libbpf to become a fully functional and tested eBPF user-land library. Using an annotation processor offers so many possibilities, so stay tuned.

Thanks for joining me on this journey to create a proper Java API for eBPF. I’ll see you in two weeks for the next installment in this series, and possibly before for a trip report on my current travels.

This article is part of my work in the SapMachine team at SAP, making profiling and debugging easier for everyone. This article was written in Canada, thanks to ConFoo and Theresa Mammarella, who made this trip possible. Inspiration came from Ansil H’s series on eBPF.

Hello eBPF: Tail calls and your first eBPF application (4)

Posted on February 12, 2024 by Johannes Bechberger

Welcome back to my blog series on eBPF. Two weeks ago, I showed you how to use perf event buffers to stream data from the eBPF program to the Java application. This week, we will finish chapter 2 of the Learning eBPF book, learn how to use tail calls and the hello-ebpf project as a library and implement one of the book’s exercises. We start with function and tail calls:

Function Calls

Regular C programs are divided into functions that call each other; so far in this series, all our eBPF programs consisted of just a single function that calls kernel functions. But can we call other eBPF functions? End of 2017, Daniel Borkman et al. introduced the ability to call other functions defined in eBPF:

It allows for better optimized code and finally allows to introduce the core bpf libraries that can be reused in different projects, since programs are no longer limited by single elf file. With function calls bpf can be compiled into multiple .o files.
bpf: introduce function calls by Alexei Starovoitov

Before this change, you had to inline the functions essentially. There is just one problem with this approach: Every new function call takes space on the stack for its call frame that contains its parameters and local variables:

The maximum stack size is limited to 512 bytes, so every call frame counts for larger eBPF programs. Modern compilers will, therefore, try to inline the function calls and save space. To reduce the required stack memory, we have essentially two options besides inlining: We can either use static variables or tail calls. Andrii Nakryiko describes the former:

Starting with Linux 5.2, d8eca5bbb2be (“bpf: implement lookup-free direct value access for maps”) adds support for BPF global (and static) variables, which we are going to use here to get rid of on-the-stack array.
BPF tips & tricks: the guide to bpf_trace_printk() and bpf_printk()

Declaring a variable as static, e.g. static int x, means that the value is stored as a global variable, existing once per program run. This is not a problem if a function doesn’t transitively call itself, which is true for all functions you would typically want to write in eBPF.

Tail Calls

Now to tail calls. If the function calls another function directly before returning (or as an argument to the return statement), then the call frames can be replaced. This is called a tail call and avoids growing the stack. In eBPF, it is possible to tail call one eBPF program (entry function that gets passed a context) from another program:

A tail call is achieved by storing the other program in a program array, which maps a 4-byte int to an eBPF program. The kernel function bpf_tail_call(ctx, program_array, index) can then be used to call a specific program:

This special helper is used to trigger a “tail call”, or in other words, to jump into another eBPF program. The same stack frame is used (but values on stack and in registers for the caller are not accessible to the callee). This mechanism allows for program chaining, either for raising the maximum number of available eBPF instructions, or to execute given programs in conditional blocks. For security reasons, there is an upper limit to the number of successive tail calls that can be performed.

Upon call of this helper, the program attempts to jump into a program referenced at index index in prog_array_map, a special map of type BPF_MAP_TYPE_PROG_ARRAY, and passes ctx, a pointer to the context.
BPF-HELPERS(7)

This function only returns when it encounters an error, returning a negative error code.

Tail Call Example

Let’s create, as an example, an entry function that is triggered for every system call and tail calls another function using the stored ebpf programs for each system call number, based on the example in the Learning eBPF book:

BPF_PROG_ARRAY(syscall, 300);

int hello(struct bpf_raw_tracepoint_args *ctx) {
    // args[1] is here the syscall number
    int nr = ctx->args[1];
    // this is the BCC syntax for bpf_tail_call
    syscall.call(ctx, nr);
    // we only reach the print if the
    // syscall number is not associated
    // with a function
    bpf_trace_printk("Another syscall: %d", nr);
    return 0;
}

int hello_exec(void *ctx) {
    bpf_trace_printk("Executing a program");
    return 0;
}

int hello_timer(struct bpf_raw_tracepoint_args *ctx) {
    int nr = ctx->args[1];
    switch (nr) {
        case 222:
            bpf_trace_printk("Creating a timer");
            break;
        case 226:
            bpf_trace_printk("Deleting a timer");
            break;
        default:
            bpf_trace_printk("Some other timer operation");
            break;
    }
    return 0;
}

int ignore_nr(void *ctx) {
    return 0;
}

We can now store a function for every system call in the syscall program array, register the hello for every system call and tail call the specified function for every system call number.

You can find this example in the hello-ebpf repository. This includes all the Java code required to attach the eBPF program and log the result. I could just show you the example code, but let’s do something different this time:

Tail Example Application

I recently released the hello-ebpf library, consisting mainly of the bcc and annotation libraries, in Sonatype’s snapshot repository. Let’s use these releases to create our first application. This first application is a version of the HelloTail example from before.

We start by cloning my new sample-bcc-project, which we subsequently modify. This sample project contains essentially the following three parts:

src/main/java/Main.java: Main class for our Maven-based build
pom.xml: Maven pom that uses the snapshot repository to depend on the me.bechberger.bcc library. It also allows you to build a JAR with all dependencies included via mvn package.
run.sh: run the built JAR with the required flags “–enable-preview –enable-native-access=ALL-UNNAMED“
README.md: Information on how to run the program and more.

We only have to change the Main class to develop our application, adding our system-call-logging-related code. Our application should be able only to log execve, and itimer-related system calls when passed the --skip-others flag on the command line. So, we start with implementing the argument parsing:

record Arguments(boolean skipOthers) {
    static Arguments parseArgs(String[] args) {
        boolean skipOthers = false;
        if (args.length > 0) {
            if (args.length == 1 && args[0].equals("--skip-others")) {
                skipOthers = true;
            } else {
                // print usage for all other arguments, this
                // includes --help
                System.err.println("""
                Usage: app [--skip-others]
                    
                   --skip-others: Only log execve and itimer system calls
                """);
                System.exit(1);
            }
        }
        return new Arguments(skipOthers);
    }
}

We then define the eBPF program, as well as some system calls that come up a lot, as static variables:

static final String EBPF_PROGRAM = """
            ...
            """;

static final int[] IGNORED_SYSCALLS = new int[]{
        21, 22, 25, 29, 56, 57, 63, 64, 66,
        72, 73, 79, 98, 101, 115, 131, 134,
        135, 139, 172, 233, 280, 291};

Now to the important part: The main and run methods that contain the central part of our application:

public static void main(String[] args) {
    run(Arguments.parseArgs(args));
}

static void run(Arguments args) {
    try (var b = BPF.builder(EBPF_PROGRAM).build()) {
        // attach to the tracepoint that is
        // called at the start of every system call
        b.attach_raw_tracepoint("sys_enter", "hello");
        
        // get the function ids of all defined functions
        var ignoreFn = b.load_raw_tracepoint_func("ignore_nr");
        var execFn = b.load_raw_tracepoint_func("hello_exec");
        var timerFn = b.load_raw_tracepoint_func("hello_timer");
        
        // obtain the program array
        var progArray = b.get_table("syscall", 
            BPFTable.ProgArray.createProvider());
        
        // map the system call execve to the hello_exec function
        progArray.set(Syscalls.getSyscall("execve").number(), 
                      execFn);
        
        // map the itimer system calls to the hello_timer function
        for (String syscall : new String[]{
                "timer_create", "timer_gettime",
                "timer_getoverrun", "timer_settime",
                "timer_delete"}) {
            progArray.set(Syscalls.getSyscall(syscall).number(), 
                          timerFn);
        }

        // ignore some system calls that come up a lot
        for (int i : IGNORED_SYSCALLS) {
            progArray.set(i, ignoreFn);
        }
        
        // print the trace using a custom formatter
        b.trace_print(f -> formatTrace(f, args.skipOthers));
    }
}

This code uses the Syscalls class from the bcc library to map system calls to their number. The only part left now is the custom formatter, which takes care of the –skip-others option:

static @Nullable String formatTrace(BPF.TraceFields f, 
  boolean skipOthers) {       
    String another = "Another syscall: ";                                          
    String line = f.line().replace("bpf_trace_printk: ", "");                      
    // replace other syscall with their names                                      
    if (line.contains(another)) {                                                  
        // skip these lines if --skip-others is passed                             
        if (skipOthers) {                                                          
            return null;                                                           
        }                                                                          
        var syscall =                                                              
                Syscalls.getSyscall(                                               
                        Integer.parseInt(                                          
                                line.substring(                                    
                                        line.indexOf(another) +                    
                                                another.length())));               
        return line.replace(another + syscall.number(),                            
                another + syscall.name());                                         
    }                                                                              
    return line;                                                                   
}

This gives us an application that we can build via mvn package, and run:

> sudo -s PATH=$PATH                                                   
> ./run.sh --skip-others                                               
     ps-26459   [031] ...2. 91897.197604: Executing a program          
    git-26551   [052] ...2. 91935.368240: Executing a program          
    git-26553   [031] ...2. 91935.373159: Executing a program          
    git-26555   [016] ...2. 91935.378132: Executing a program          
  <...>-26558   [053] ...2. 91935.383839: Executing a program          
   tail-26561   [004] ...2. 91935.388621: Executing a program          
    git-26562   [099] ...2. 91935.388970: Executing a program
   ...          
> ./run.sh                                                      
  <...>-3277    [122] ...2. 91946.796677: Another syscall: recvmsg     
   Xorg-3045    [121] ...2. 91946.796678: Another syscall: setitimer   
  <...>-26461   [074] ...2. 91946.796680: Another syscall: readlink    
   Xorg-3045    [121] ...2. 91946.796680: Another syscall: epoll_wait  
  <...>-3457    [068] ...2. 91946.796681: Another syscall: recvmsg     
  <...>-3277    [122] ...2. 91946.796682: Another syscall: recvmsg     
  <...>-26461   [074] ...2. 91946.796684: Another syscall: readlink    
  <...>-3277    [122] ...2. 91946.796685: Another syscall: recvmsg     
  <...>-3457    [068] ...2. 91946.796689: Another syscall: recvmsg     
  <...>-3277    [122] ...2. 91946.796690: Another syscall: recvmsg
  ...

You can run this either on a Linux machine with Java 21 and libbcc installed or on Mac using the Lima VM:

> limactl start hello-ebpf.yaml
> limactl shell hello-ebpf
> sudo -s
> ./run.sh
# ...

More information and the whole implementation in the System Call Logger branch of the sample-bcc-project.

Conclusion

In this blog post, I showed you how to use tail calls and develop your first standalone eBPF application using the hello-ebpf library. Most of the bcc implementation was present two weeks ago when I wrote my previous blog post of this series, but now it’s slightly more polished. The hello-ebpf libaries’ releases are currently live in the snapshot repository.

Now, on to you: There are exercises at the end of chapter 2 of the Learning eBPF book. Can you implement them on your own? Clone the sample-bcc-project and give it a try. I’m happy to showcase any cool forks in my next blog post.

Thanks for joining me on this journey to create a proper Java API for eBPF. I’m looking forward to finishing porting the whole bcc API and starting with the next iteration of this project. I’ll keep you posted; see you in my next post.

This article is part of my work in the SapMachine team at SAP, making profiling and debugging easier for everyone.

Is JDWP’s onjcmd feature worth using?

Posted on February 9, 2024 by Johannes Bechberger

A few months ago, I told you about the onjcmd feature in my blog post Level-up your Java Debugging Skills with on-demand Debugging (which is coming to JavaLand 2024). The short version is that adding onjcmd=y to the list of JDWP options allows you to delay accepting the incoming connection request in the JDWP agent until jcmd <JVM pid> VM.start_java_debugging is called.

The main idea is that the JDWP agent

only listens on the debugging port after it is triggered, which could have some security benefits
and that the JDWP agent causes less overhead while waiting, compared to just accepting connections from the beginning.

The first point is debatable; one can find arguments for and against it. But for the second point, we can run some benchmarks. After renewed discussions, I started benchmarking to conclude whether the onjcmd feature improves on-demand debugging performance. Spoiler alert: It doesn’t.

Benchmarks

As for the benchmarks, I chose to run the Renaissance benchmark suite (version 0.15.0):

Renaissance is a modern, open, and diversified benchmark suite for the JVM, aimed at testing JIT compilers, garbage collectors, profilers, analyzers and other tools.

Renaissance is a benchmarking suite that contains a range of modern workloads, comprising of various popular systems, frameworks and applications made for the JVM.

Renaissance benchmarks exercise a range of programming paradigms, including concurrent, parallel, functional and object-oriented programming.
RENAISSANCE.DEV

Renaissance typically runs the sub-benchmarks in multiple iterations. Still, I decided to run the sub-benchmarks just once per Renaissance run (via -r 1) and instead run Renaissance itself ten times using hyperfine to get a proper run-time distribution. I compared three different executions of Renaissance for this blog post:

without JDWP: Running Renaissance without any debugging enabled, to have an appropriate baseline, via java -jar renaissance.jar all -r 1
with JDWP: Running Renaissance in debugging mode, with the JDWP agent accepting debugging connections the whole time without suspending the JVM, via java -agentlib:jdwp=transport=dt_socket,server=y,suspend=n,address=*:5005 -jar renaissance.jar all -r 1
with onjcmd: Running Renaissance in debugging mode, with the JDWP agent accepting debugging connections only after the jcmd call without suspending the JVM, via java -agentlib:jdwp=transport=dt_socket,server=y,suspend=n,onjcmd=y,address=*:5005 -jar renaissance.jar all -r 1

Remember that we never start a debugging session or use jcmd, as we’re only interested in the performance of the JVM while waiting for a debugging connection in the JDWP agent.

Yes, I know that Renaissance uses different iteration numbers for the sub-benchmarks, but this should not affect the overall conclusions from the benchmark.

Results

Now to the results. For a current JDK 21 on my Ubuntu 23.10 machine with a ThreadRipper 3995WX CPU, hyperfine obtains the following benchmarks:

Benchmark 1: without JDWP
  Time (mean ± σ):     211.075 s ±  1.307 s    [User: 4413.810 s, System: 1438.235 s]
  Range (min … max):   209.667 s … 213.361 s    10 runs

Benchmark 2: with JDWP
  Time (mean ± σ):     218.985 s ±  1.924 s    [User: 4533.024 s, System: 1133.126 s]
  Range (min … max):   216.673 s … 222.249 s    10 runs

Benchmark 3: with onjcmd
  Time (mean ± σ):     219.469 s ±  1.185 s    [User: 4537.213 s, System: 1181.856 s]
  Range (min … max):   217.824 s … 221.316 s    10 runs

Summary
  "without JDWP" ran
    1.04 ± 0.01 times faster than "with JDWP"
    1.04 ± 0.01 times faster than "with onjcmd"

You can see that the run-time difference between “with JDWP” and “with onjcmd” is 0.5s, way below the standard deviations of both benchmarks. Plotting the benchmark results using box plots visualizes this fact:

Or, more analytically, Welch’s t-test doesn’t rule out the possibility of both benchmarks producing the same run-time distribution with p=0.5. There is, therefore, no measurable effect on the performance if we use the onjcmd feature. But what we do notice is that enabling the JDWP agent results in an increase in the run-time by 4%.

The question is then: Why has it been implemented in the JDK at all? Let’s run Renaissance on JDK 11.0.3, the first release supporting onjcmd.

Results on JDK 11.0.3

Here, using onjcmd results in a significant performance improvement of a factor of 1.5 (from 354 to 248 seconds) compared to running the JDWP agent without it:

Benchmark 1: without JDWP
  Time (mean ± σ):     234.011 s ±  2.182 s    [User: 5336.885 s, System: 706.926 s]
  Range (min … max):   229.605 s … 237.845 s    10 runs
 
Benchmark 2: with JDWP
  Time (mean ± σ):     353.572 s ± 20.300 s    [User: 4680.987 s, System: 643.978 s]
  Range (min … max):   329.610 s … 402.410 s    10 runs
 
Benchmark 3: with onjcmd
  Time (mean ± σ):     247.766 s ±  1.907 s    [User: 4690.555 s, System: 609.904 s]
  Range (min … max):   245.575 s … 251.026 s    10 runs
Summary
  "without JDWP" ran
    1.06 ± 0.01 times faster than "with onjcmd"
    1.51 ± 0.09 times faster than "with JDWP"

We excluded the finagle-chirper sub-benchmark here, as it causes the run-time to increase drastically. The sub-benchmark alone does not cause any problems, so the GC run possibly causes the performance hit before the sub-benchmark, which cleans up after the dotty sub-benchmark. Dotty is run directly before finagle-chirper.

Please be aware that the run sub-benchmarks on JDK 11 differ from the run on JDK 21, so don’t compare it to the results for JDK 21.

But what explains this difference?

Fixes since JDK 11.0.3

Between JDK 11.0.3 and JDK 21, there have been improvements to the OpenJDK, some of which drastically improved the performance of the JVM in debugging mode. Most notable is the fix for JDK-8227269 by Roman Kennke. The issue, reported by Egor Ushakov, reads as follows:

Slow class loading when running with JDWP

When debug mode is active (-agentlib:jdwp), an application spends a lot of time in JVM internals like Unsafe.defineAnonymousClass or Class.getDeclaredConstructors.Sometimes this happens on EDT and UI freezes occur.

If we look into the code, we’ll see that whenever a new class is loaded and an event about it is delivered, when a garbage collection has occurred, classTrack_processUnloads iterates over all loaded classes to see if any of them have been unloaded. This leads to O(classCount * gcCount) performance, which in case of frequent GCs (and they are frequent, especially the minor ones) is close to O(classCount^2). In IDEA, we have quite a lot of classes, especially counting all lambdas, so this results in quite significant overhead.
JDK-8227269

This change came into the JDK with 11.0.9. We see the 11.0.3 results with 11.0.8, but with 11.0.9, we see the results of the current JDK 11:

Benchmark 1: without JDWP
  Time (mean ± σ):     234.647 s ±  2.731 s    [User: 5331.145 s, System: 701.760 s]
  Range (min … max):   228.510 s … 238.323 s    10 runs
 
Benchmark 2: with JDWP
  Time (mean ± σ):     250.043 s ±  3.587 s    [User: 4628.578 s, System: 716.737 s]
  Range (min … max):   242.515 s … 254.456 s    10 runs
 
Benchmark 3: with onjcmd
  Time (mean ± σ):     249.689 s ±  1.765 s    [User: 4788.539 s, System: 729.207 s]
  Range (min … max):   246.324 s … 251.559 s    10 runs
 
Summary
  "without JDWP" ran
    1.06 ± 0.01 times faster than "with onjcmd"
    1.07 ± 0.02 times faster than "with JDWP"

This clearly shows the significant impact of the change. 11.0.3 came out on Apr 18, 2019, and 11.0.9 on Jul 15, 2020, so the onjcmd improved on-demand debugging for almost a year.

Want to try this out yourself? Get the binaries from SapMachine and run the benchmarks yourself. This kind of performance archaeology is quite rewarding, giving you insights into critical performance issues.

Conclusion

A few years ago, it was definitely a good idea to add the onjcmd feature to have usable on-demand debugging performance-wise. But nowadays, we can just start the JDWP agent to wait for a connection and connect to it whenever we want to, without any measurable performance penalty (in the Renaissance benchmark).

This shows us that it is always valuable to reevaluate if specific features are worth the maintenance cost. I hope this blog post gave you some insights into the performance of on-demand debugging. See you next week for the next installment in my hello-ebpf series.

This article is part of my work in the SapMachine team at SAP, making profiling and debugging easier for everyone.

Let’s create a Python Debugger together: FOSDEM Talk

Posted on February 8, 2024 by Johannes Bechberger

A small addendum to the previous six parts of my journey down the Python debugger rabbit hole (part 1, part 2, part 3, part 4, part 5, and part 6).

I gave a talk on the topic of Python 3.12’s new monitoring and debugging API at FOSDEM’s Python Devroom:

Furthermore, I’m excited to announce my acceptance to PyCon Berlin this year. When I started my blog series last year, I would’ve never dreamed of speaking at a large Python conference. I’m probably the only OpenJDK developer there, but I’m happy to meet many new people from a different community.

This article is part of my work in the SapMachine team at SAP, making profiling and debugging easier for everyone.

Hello eBPF: Recording data in event buffers (3)

Posted on January 29, 2024 by Johannes Bechberger

Welcome back to my blog series on eBPF. Last week, I showed you how the eBPF program and Java application can communicate using eBPF maps. This allowed us to write an application that counts the number of execve calls per user.

This week, I’ll show you briefly how to use another kind of eBPF maps, the perf event buffer, and run tests with docker and JUnit 5.

This blog post is shorter than the previous one as I’m preparing for the OpenJDK committers workshop in Brussels and my Python and Java DevRoom talks at FOSDEM. I’m happy to meet my readers; say hi when you’re there.

Perf Event Buffer

Data structures, like the hash map described in the previous blog post, are great for storing data but have their limitation when we want to pass new bits of information continuously from the eBPF program to our user-land application. This is especially pertinent when recording performance events. So, in 2015, the Linux kernel got a new map type: BPF_MAP_TYPE_PERF_EVENT_ARRAY. This map type functions as a fixed-size ring buffer that can store elements of a given size and is allocated per CPU. The eBPF program submits data to the buffer, and the user-land application retrieves it. When the buffer is full, data can’t be submitted, and a drop counter is incremented.

Perf Event Buffers have their issues, as explained by Andrii Nakryiko, so in 2020, eBPF got ring buffers, which have less overhead. Perf Event Buffers are still used, as only Linux 5.8 and above supports ring buffers. It doesn’t make a difference for our toy examples, but I’ll show you how to use ring buffers in a few weeks.

You can read more about Perf Event Buffers in the Learning eBPF book by Liz Rice, pages 24 to 28.

Example

Now, to a small example, called chapter2.HelloBuffer, which records for every execve call the calling process id, the user id, and the current task name and transmits it to the Java application:

> ./run.sh chapter2.HelloBuffer
2852613 1000 code Hello World  # vs code
2852635 1000 code Hello World
2852667 1000 code Hello World
2852690 1000 code Hello World
2852742 1000 Sandbox Forked Hello World  # Firefox
2852760 1000 pool-4-thread-1 Hello World
2852760 1000 jspawnhelper Hello World    # Java ProcessBuilder
2852760 1000 jspawnhelper Hello World
2852760 1000 jspawnhelper Hello World
2852760 1000 jspawnhelper Hello World
2852760 1000 jspawnhelper Hello World
2852760 1000 jspawnhelper Hello World
2852760 1000 jspawnhelper Hello World
2852760 1000 jspawnhelper Hello World

This gives us already much more information than the simple counter from my last blog post. The eBPF program to achieve this is as follows:

BPF_PERF_OUTPUT(output);                                                 
                                                                         
struct data_t {                                                          
    int pid;                                                             
    int uid;                                                             
    char command[16];                                                    
    char message[12];                                                    
};                                                                       
                                                                         
int hello(void *ctx) {                                                   
    struct data_t data = {};                                             
    char message[12] = "Hello World";                                    
    
    // obtain process and user id                                                                     
    data.pid = bpf_get_current_pid_tgid() >> 32;                         
    data.uid = bpf_get_current_uid_gid() & 0xFFFFFFFF;                   
    
    // obtain the current task/thread/process name, 
    // without the folder, of the task that is currently
    // running                                                                     
    bpf_get_current_comm(&data.command, 
        sizeof(data.command));
    // "Safely attempt to read size bytes from kernel space
    //  address unsafe_ptr and store the data in dst." (man-page)           
    bpf_probe_read_kernel(&data.message, 
        sizeof(data.message), message); 
    
    // try to submit the data to the perf buffer                                                                     
    output.perf_submit(ctx, &data, sizeof(data));                        
                                                                         
    return 0;                                                            
}

You can get more information on bpf_get_current_com, bpf_probe_read_kernel in the bpf-helpers(7) man-page.

The Java application that reads the buffer and prints the obtained information is not too dissimilar from the example in my previous blog post. We first define the Data type:

record Data(
   int pid, 
   int uid, 
   // we model char arrays as Strings
   // with a size annotation
   @Size(16) String command,
   @Size(12) String message) {}                                                                                                                              
 
// we have to model the data type as before                                                                                                                              
static final BPFType.BPFStructType<Data> DATA_TYPE = 
   new BPFType.BPFStructType<>("data_t",                              
        List.of(                                                                                                               
                new BPFType.BPFStructMember<>("pid", 
                     BPFType.BPFIntType.INT32, 0, Data::pid),                                  
                new BPFType.BPFStructMember<>("uid", 
                     BPFType.BPFIntType.INT32, 4, Data::uid),                                  
                new BPFType.BPFStructMember<>("command", 
                     new BPFType.StringType(16), 8, Data::command),                        
                new BPFType.BPFStructMember<>("message", 
                     new BPFType.StringType(12), 24, Data::message)),                      
        new BPFType.AnnotatedClass(Data.class, List.of()),                                                                     
            objects -> new Data((int) objects.get(0), 
                                (int) objects.get(1), 
                                (String) objects.get(2),
                                (String) objects.get(3)));

You might recognize that the BPF types now have the matching Java type in their type signature. I added this to have more type safety and less casting.

To retrieve the events from the buffer, we first have to open it and pass in a call-back. This call-back is called for every available event when we call PerfEventArray#perf_buffer_poll:

try (var b = BPF.builder("""                                                                                                    
        ...                                                                                                                     
        """).build()) {                                                                                                         
    var syscall = b.get_syscall_fnname("execve");                                                                               
    b.attach_kprobe(syscall, "hello");                                                                                          
                                                                                                                                
    BPFTable.PerfEventArray.EventCallback<Data> print_event = 
      (/* PerfEventArray instance */ array, 
       /* cpu id of the event */     cpu, 
       /* event data */              data, 
       /* size of the event data */  size) -> {                                     
        var d = array.event(data);                                                                                              
        System.out.printf("%d %d %s %s%n", 
            d.pid(), d.uid(), d.command(), d.message());                                         
    };                                                                                                                          
                                                                                                                                
    try (var output = b.get("output", 
         BPFTable.PerfEventArray.<Data>createProvider(DATA_TYPE))
             .open_perf_buffer(print_event)) { 
        while (true) {
            // wait till packages are available,
            // you can a timeout in milliseconds                                                                                                          
            b.perf_buffer_poll();                                                                                               
        }                                                                                                                       
    }                                                                                                                           
}

Tests

I’m happy to announce that hello-ebpf now has its own test runner, which uses virtme and docker to run all tests in their own runtime with their own kernel. All this is wrapped in my testutil/bin/java wrapper so that you can run the tests using mvn test:

mvn -Djvm=testutil/bin/java

And the best part? All tests are written using plain JUnit 5. As an example, here is the HelloWorld test:

public class HelloWorldTest {
    @Test
    public void testHelloWorld() throws Exception {
        try (BPF b = BPF.builder("""
                int hello(void *ctx) {
                   bpf_trace_printk("Hello, World!");
                   return 0;
                }
                """).build()) {
            var syscall = b.get_syscall_fnname("execve");
            b.attach_kprobe(syscall, "hello");
            Utils.runCommand("uname", "-r");
            // read the first trace line
            var line = b.trace_readline();
            // assert its content
            assertTrue(line.contains("Hello, World!"));
        }
    }
}

There are currently only two tests, but I plan to add many more.

Conclusion

In this blog post, we learned about Perf Event Buffers, a valuable data structure for repeatedly pushing information from the eBPF program to the user-land application. Implementing this feature, we’re getting closer and closer to completing chapter 2 of the Learning eBPF book. Truth be told, the implementation in the GitHub repository supports enough of the BCC to implement the remaining examples and even the exercises from Chapter 2.

In the next part of the hello-ebpf series, I’ll show you how to tail call in eBPF to other eBPF functions and how to write your first eBPF application that uses the hello-ebpf library as a dependency.

This article is part of my work in the SapMachine team at SAP, making profiling and debugging easier for everyone.

Hello eBPF: Recording data in basic eBPF maps (2)

Posted on January 12, 2024 by Johannes Bechberger

Welcome back to my blog series on eBPF. Last week, I introduced eBPF, the series, and the project and showed how you can write a simple eBPF application with Java that prints “Hello World!” whenever a process calls execve:

public class HelloWorld {
  public static void main(String[] args) {
    try (BPF b = BPF.builder("""
            int hello(void *ctx) {
               bpf_trace_printk("Hello, World!");
               return 0;
            }
            """).build()) {
      var syscall = b.get_syscall_fnname("execve");
      b.attach_kprobe(syscall, "hello");
      b.trace_print();
    }
  }
}

But what if we want to send more information from our eBPF program to our userland application than just some logs? For example, to share the accumulated number of execve calls, the processes of a specific user called and transmits information akin to:

record Data(
     /** user id */
     @Unsigned long uid,
     /** group id */
     @Unsigned long gid, 
     /** count of execve calls */
     @Unsigned int counter) {}

This is what this week’s blog post is all about.

Communication

When two regular programs want to share information, they either send data via sockets or use shared memory that both programs can access:

eBPF uses none of the above two approaches: Working with sockets makes a shared state hard to maintain, and using shared memory is difficult because the eBPF program lives in the kernel and the Java program in userland. Accessing any userland memory from eBPF at all is deemed to be experimental, according to the official BPF Design Q&A:

Q: Can BPF overwrite arbitrary user memory?

A: Sort-of.

Tracing BPF programs can overwrite the user memory of the current task with bpf_probe_write_user(). Every time such program is loaded the kernel will print warning message, so this helper is only useful for experiments and prototypes. Tracing BPF programs are root only.
BPF Design Q&A

But how can we then communicate? This is where eBPF maps come in:

BPF ‘maps’ provide generic storage of different types for sharing data between kernel and user space. There are several storage types available, including hash, array, bloom filter and radix-tree. Several of the map types exist to support specific BPF helpers that perform actions based on the map contents.

BPF maps are accessed from user space via the bpf syscall, which provides commands to create maps, lookup elements, update elements and delete elements.
LINUX Kernel Documentation

These fixed-size data structures form the backbone of every eBPF application, and their support is vital to creating any non-trivial tool.

Using basic eBPF maps

Using these maps, we can implement our execve-call-counter eBPF program. We start with the simple version that just stores the counter in a simple user-id-to-counter hash map:

// macro to create a uint64_t to uin64_t hash map
BPF_HASH(counter_table);

// u64 (also known as uint64_t) is an unsigned
// integer with a width of 64 bits
// in Java terms, it's the unsigned version
// of long

int hello(void *ctx) {
   u64 uid;
   u64 counter = 0;
   u64 *p;

   uid = bpf_get_current_uid_gid() & 0xFFFFFFFF;
   p = counter_table.lookup(&uid);
   // p is null if the element is not in the map
   if (p != 0) {
      counter = *p;
   }
   counter++;
   counter_table.update(&uid, &counter);
   return 0;
}

This example is from the Learning eBPF book by Liz Rice, pages 21 to 23, where you can find a different take. And if you’re wondering why we’re using u64 instead of the more standard uint64_t, this is because the Linux kernel predates the definition of u64 (and other such types) in stdint.h (see StackOverflow), although today it’s possible to use both.

In this example, we first create a hash called counter_table using the bcc macro BPF_HASH. We can access the hash map using the bcc-only method lookup and update, which are convenience wrappers for void *bpf_map_lookup_elem(struct bpf_map *map, const void *key) and long bpf_map_update_elem(struct bpf_map *map, const void *key,const void *value, u64 flags) (see the bpf-helpers man-page). Additionally, we use bpf_get_current_uid_gid() to get the current user-id:

u64 bpf_get_current_uid_gid(void)

Description Get the current uid and gid.

Return A 64-bit integer containing the current GID and UID, and created as such: current_gid << 32 | current_uid.
bpf-helpers man-page

A side note regarding naming: “table” and “map” are used interchangeably in the bcc Python-API and related examples, which I carried over into the Java-API for consistency.

Now to the userland program: The hello-ebpf Java API offers methods to access these maps and can be used to write a userland program, HelloMap, that prints the contents of the maps every few seconds:

public class HelloMap {
    public static void main(String[] args) 
      throws InterruptedException {
        try (var b = BPF.builder("""
                ...
                """).build()) {
            var syscall = b.get_syscall_fnname("execve");
            // attach the eBPF program to execve
            b.attach_kprobe(syscall, "hello");
            // create a mirror for the hash table eBPF map
            BPFTable.HashTable<Long, Long> counterTable = 
               b.get_table("counter_table", 
                           UINT64T_MAP_PROVIDER);
            while (true) {
                Thread.sleep(2000);
                // the map mirror implements the Java Map
                // interface with methods like 
                // Map.entrySet
                for (var entry : counterTable.entrySet()) {
                    System.out.printf("ID %d: %d\t", 
                                      entry.getKey(), 
                                      entry.getValue());
                }
                System.out.println();
            }
        }
    }
}

This program attaches the eBPF program to the execve system call and uses the HashTable map mirror to access the map counter_table.

You can run the example using the run.sh script (after you built the project via the build.sh script) as root on an x86 Linux:

> ./run.sh chapter2.HelloMap
ID 0: 1 ID 1000: 3
ID 0: 1 ID 1000: 3
ID 0: 1 ID 1000: 4
ID 0: 1 ID 1000: 11
ID 0: 1 ID 1000: 11
ID 0: 1 ID 1000: 12
...
ID 0: 22 ID 1000: 176

Here, user 0 is the root user, and user 1000 is my non-root user, I called ls in the shell with both users a few times to gather some data.

But maybe my map mirror is broken, and this data is just a fluke? It’s always good to have a way to check the content of the maps. This is where bpftool-map comes into play: We can use

> bpftool map list
2: prog_array  name hid_jmp_table  flags 0x0
        key 4B  value 4B  max_entries 1024  memlock 8512B
        owner_prog_type tracing  owner jited
40: hash  name counter_table  flags 0x0
        key 8B  value 8B  max_entries 10240  memlock 931648B
        btf_id 142

> bpftool map dump name counter_table
[{
        "key": 1000,
        "value": 163
    },{
        "key": 0,
        "value": 22
    }
]

We can see that our examples are in the correct ballpark.

To learn more about the features of bpftool, I highly recommend reading the article “Features of bpftool: the thread of tips and examples to work with eBPF objects” by Quentin Monnet.

Storing simple numbers in a map is great, but what if we want to keep more complex information as values in the map, like the Data record with user-id, group-id, and counter from the beginning of this article?

The most recent addition to the hello-ebpf project is the support of record/struct values in maps:

Storing more complex structs in maps

The eBPF code for this example is a slight extension of the previous example:

// record Data(
//    @Unsigned long uid, 
//    @Unsigned long gid, 
//    @Unsigned int  counter
// ){}
struct data_t {
   u64 uid;
   u64 gid;
   u32 counter;
};
                
// u64 to data_t map
BPF_HASH(counter_table, u64, struct data_t);
                
int hello(void *ctx) {
   // get user id
   u64 uid = bpf_get_current_uid_gid() & 0xFFFFFFFF;
   // get group id
   u64 gid = bpf_get_current_uid_gid() >> 32;
   // create data object 
   // with uid, gid and counter=0
   struct data_t info = {uid, gid, 0};
   struct data_t *p = counter_table.lookup(&uid);
   if (p != 0) {
      info = *p;
   }
   info.counter++;
   counter_table.update(&uid, &info);
   return 0;
}

The Java application is slightly more complex, as we have to model the data_t struct in Java. We start by defining the record Data as before:

record Data(
     /** user id */
     @Unsigned long uid,
     /** group id */
     @Unsigned long gid, 
     /** count of execve calls */
     @Unsigned int counter) {}

The @Unsigned annotation is part of the ebpf-annotations module and allows you to document type properties that aren’t present in Java.

The mirror BPFType for structs in hello-ebpf BPFType.BPFStructType:

/**
 * Struct
 *
 * @param bpfName     name of the struct in BPF
 * @param members     members of the struct, 
 *                    order should be the same as 
 *                    in the constructor
 * @param javaClass   class that represents the struct
 * @param constructor constructor that takes the members 
 *                    in the same order as 
 *                    in the constructor
 */
record BPFStructType(String bpfName, 
                    List<BPFStructMember> members, 
                    AnnotatedClass javaClass,
                    Function<List<Object>, ?> constructor) 
    implements BPFType

Which model struct members as follows:

/**
 * Struct member
 *
 * @param name   name of the member
 * @param type   type of the member
 * @param offset offset from the start of the struct in bytes
 * @param getter function that takes the struct and returns the member
 */
record BPFStructMember(String name, 
                       BPFType type, 
                       int offset, 
                       Function<?, Object> getter)

With these classes, we can model our data_t struct as follows:

BPFType.BPFStructType DATA_TYPE = 
    new BPFType.BPFStructType("data_t",
        List.of(
          new BPFType.BPFStructMember(
            "uid", 
            BPFType.BPFIntType.UINT64, 
            /* offset */ 0, (Data d) -> d.uid()),
          new BPFType.BPFStructMember(
            "gid", 
            BPFType.BPFIntType.UINT64, 
            8, (Data d) -> d.gid()),
          new BPFType.BPFStructMember(
            "counter", 
            BPFType.BPFIntType.UINT32, 
            16, (Data d) -> d.counter())),
        new BPFType.AnnotatedClass(Data.class, List.of()),
            objects -> 
              new Data((long) objects.get(0), 
                       (long) objects.get(1), 
                       (int) objects.get(2)));

This is cumbersome, I know, but it will get easier soon, I promise.

The DATA_TYPE type can then be passed to the BPFTable.HashTable to create the UINT64T_DATA_MAP_PROVIDER:

BPFTable.TableProvider<BPFTable.HashTable<@Unsigned Long, Data>> 
    UINT64T_DATA_MAP_PROVIDER =
        (/* BPF object */ bpf, 
         /* map id in eBPF */ mapId, 
         /* file descriptor of the map */ mapFd, 
         /* name of the map */ name) ->
                new BPFTable.HashTable<>(
                     bpf, mapId, mapFd, 
                     /* key type */   BPFType.BPFIntType.UINT64, 
                     /* value type */ DATA_TYPE, 
                     name);

We use this provider to access the map with BPF#get_table:

public class HelloStructMap {

    // ...

    public static void main(String[] args) 
      throws InterruptedException {
        try (var b = BPF.builder("""
                // ...
                """).build()) {
            var syscall = b.get_syscall_fnname("execve");
            b.attach_kprobe(syscall, "hello");

            var counterTable = b.get_table("counter_table", 
                 UINT64T_DATA_MAP_PROVIDER);
            while (true) {
                Thread.sleep(2000);
                for (var value : counterTable.values()) {
                    System.out.printf(
                       "ID %d (GID %d): %d\t", 
                       value.uid(), value.gid(), 
                       value.counter());
                }
                System.out.println();
            }
        }
    }
}

We can run the example and get the additional information:

> ./run.sh own.HelloStructMap
ID 0 (GID 0): 1 ID 1000 (GID 1000): 3
ID 0 (GID 0): 1 ID 1000 (GID 1000): 9
...
ID 0 (GID 0): 1 ID 1000 (GID 1000): 13
ID 0 (GID 0): 5 ID 1000 (GID 1000): 14

> bpftool map dump name counter_table
[{
        "key": 0,
        "value": {
            "uid": 0,
            "gid": 0,
            "counter": 5
        }
    },{
        "key": 1000,
        "value": {
            "uid": 1000,
            "gid": 1000,
            "counter": 13
        }
    }
]

Granted, it doesn’t give you more insights into the observed system, but it is a showcase of the current state of the map support in hello-ebpf.

Conclusion

eBPF maps are the primary way to communicate information between the eBPF program and the userland application. Hello-ebpf gained with this blog post support for basic eBPF hash maps and the ability to store structures in these maps. But of course, hash maps are not the only type of maps; we’ll add support for other map types, like perf maps and queues, in the next blog posts, as well as making the struct definitions a little bit easier. So stay tuned.

This article is part of my work in the SapMachine team at SAP, making profiling and debugging easier for everyone. Thanks to Mohammed Aboullaite for answering my many questions.

Hello eBPF: Developing eBPF Apps in Java (1)

Posted on December 31, 2023 by Johannes Bechberger

eBPF allows you to attach programs directly to hooks in the Linux kernel without loading kernel modules, like hooks for networking or executing programs. This has historically been used for writing custom package filters in firewalls. Still, nowadays, it is used for monitoring and tracing, becoming an ever more critical building block of modern observability tools. To quote from ebpf.io:

Historically, the operating system has always been an ideal place to implement observability, security, and networking functionality due to the kernel’s privileged ability to oversee and control the entire system. At the same time, an operating system kernel is hard to evolve due to its central role and high requirement towards stability and security. The rate of innovation at the operating system level has thus traditionally been lower compared to functionality implemented outside of the operating system.

eBPF changes this formula fundamentally. It allows sandboxed programs to run within the operating system, which means that application developers can run eBPF programs to add additional capabilities to the operating system at runtime. The operating system then guarantees safety and execution efficiency as if natively compiled with the aid of a Just-In-Time (JIT) compiler and verification engine. This has led to a wave of eBPF-based projects covering a wide array of use cases, including next-generation networking, observability, and security functionality.

Today, eBPF is used extensively to drive a wide variety of use cases: Providing high-performance networking and load-balancing in modern data centers and cloud native environments, extracting fine-grained security observability data at low overhead, helping application developers trace applications, providing insights for performance troubleshooting, preventive application and container runtime security enforcement, and much more. The possibilities are endless, and the innovation that eBPF is unlocking has only just begun.

Writing eBPF apps

On the lowest level, eBPF programs are compiled down to eBPF bytecode and attached to hooks in the kernel via a syscall. This is tedious; so many libraries for eBPF allow you to write applications using and interacting with eBPF in C++, Rust, Go, Python, and even Lua.

But there are none for Java, which is a pity. So… I decided to write bindings using the new Foreign Function API (Project Panama, preview in 21) and bcc, the first and widely used library for eBPF, which is typically used with its Python API and allows you to write eBPF programs in C, compiling eBPF programs dynamically at runtime.

That’s why I wrote From C to Java Code using Panama a few weeks ago.

Anyway, I’m starting my new blog series and eBPF library hello-ebpf:

Let’s discover eBPF together. Join me on the journey to write all examples from the Learning eBPF book (get it also from Bookshop.org, Amazon, or O’Reilly) by Liz Rice and more in Java, implementing a Java library for eBPF along the way, with a blog series to document the journey. I highly recommend reading the book alongside my articles; for this blog post, I read the book till page 18.

The project is still in its infancy, but I hope that we can eventually extend the overview image from ebpf.io with a duke:

Goals

The main goal is to provide a library (and documentation) for Java developers to explore eBPF and write their own eBPF programs without leaving their favorite language and runtime.

The initial goal is to be as close to the BCC Python API as possible to port the book’s examples to Java easily. You can find the Java versions of the examples in the src/main/me/bechberger/samples and the API in the src/main/me/bechberger/bcc directory in the GitHub repository.

Implementation

The Python API is just a wrapper around the bcc library using the built-in cffi, which extends the raw bindings to improve usability. The initial implementation of the library is a translation of the Python code to Java 21 code with Panama for FFI.

For example the following method of the Python API

    def get_syscall_fnname(self, name):
        name = _assert_is_bytes(name)
        return self.get_syscall_prefix() + name

is translated into Java as follows:

    public String get_syscall_fnname(String fnName) {
        return get_syscall_prefix() + fnName;
    }

This is the reason why the library has the same license as the Python API, Apache 2.0. The API is purposefully close to the Python API and only deviates where absolutely necessary, adding a few helper methods to improve it slightly. This makes it easier to work with the examples from the book and speeds up the initial development. But finishing a translation of the Python API is not the end goal:

Plans

A look ahead into the future so you know what to expect:

Implement the full API so that we can recreate all bcc examples from the book
Make it adequately available as a library on Maven Central
Support the newer libbpf library
Allow writing eBPF programs in Java

These plans might change, but I’ll try to keep this current. I’m open to suggestions, contributions, and ideas.

Contributing

Contributions are welcome; just open an issue or a pull request. Discussions take place in the discussions section of the GitHub repository. Please spread the word if you like it; this greatly helps the project.

I’m happy to include more example programs, API documentation, helper methods, and links to repositories and projects that use this library.

Running the first example

The Java library is still in its infancy, but we are already implementing the most basic eBPF program from the book that prints “Hello World!” every time a new program is started via the execve system call:

> ./run.sh bcc.HelloWorld
           <...>-30325   [042] ...21 10571.161861: bpf_trace_printk: Hello, World!
             zsh-30325   [004] ...21 10571.164091: bpf_trace_printk: Hello, World!
             zsh-30325   [115] ...21 10571.166249: bpf_trace_printk: Hello, World!
             zsh-39907   [127] ...21 10571.167210: bpf_trace_printk: Hello, World!
             zsh-30325   [115] ...21 10572.231333: bpf_trace_printk: Hello, World!
             zsh-30325   [060] ...21 10572.233574: bpf_trace_printk: Hello, World!
             zsh-30325   [099] ...21 10572.235698: bpf_trace_printk: Hello, World!
             zsh-39911   [100] ...21 10572.236664: bpf_trace_printk: Hello, World!
 MediaSu~isor #3-19365   [064] ...21 10573.417254: bpf_trace_printk: Hello, World!
 MediaSu~isor #3-22497   [000] ...21 10573.417254: bpf_trace_printk: Hello, World!
 MediaPD~oder #1-39914   [083] ...21 10573.418197: bpf_trace_printk: Hello, World!
 MediaSu~isor #3-39913   [116] ...21 10573.418249: bpf_trace_printk: Hello, World!

This helps you track the processes that use execve and lets you observe that Firefox (via MediaSu~isor) creates many processes and see whenever a Z-Shell creates a new process.

The related code can be found in chapter2/HelloWorld.java:

public class HelloWorld {
  public static void main(String[] args) {
    try (BPF b = BPF.builder("""
            int hello(void *ctx) {
               bpf_trace_printk("Hello, World!");
               return 0;
            }
            """).build()) {
      var syscall = b.get_syscall_fnname("execve");
      b.attach_kprobe(syscall, "hello");
      b.trace_print();
    }
  }
}

The eBPF program appends a “Hello World” trace message to the /sys/kernel/debug/tracing/trace DebugFS file via bpf_trace_printk everytime the hello method is called. The trace has the following format: “<current task, e.g. zsh>-<process id> [<CPU id the task is running on>] <options> <timestamp>: <appending ebpf method>: <actual message, like 'Hello World'>“. But bpf_trace_printk is slow, it should only be used for debugging purposes.

The Java code attaches the hello method to the execve system call and then prints the lines from the /sys/kernel/debug/tracing/trace file. The program is equivalent to the Python code from the book. But, of course, many features have not yet been implemented and so the programs you can write are quite limited.

Conclusion

eBPF is an integral part of the modern observability tech stack. The hello-ebpf Java library will allow you to write eBPF applications directly in Java for the first time. This is an enormous undertaking for a side project so it will take some time. With my new blog series, you can be part of the journey, learning eBPF and building great tools.

I plan to write a blog post every few weeks and hope you join me. You wouldn’t be the first: Mohammed Aboullaite has already entered and helped me with his eBPF expertise. The voyage will hopefully take us from the first hello world examples shown in this blog post to a fully fledged Java eBPF library.

This article is part of my work in the SapMachine team at SAP, making profiling and debugging easier for everyone. Thank you to Martin Dörr and Lukas Werling who helped in the preparation of this article.

From C to Java Code using Panama

Posted on December 11, 2023 by Johannes Bechberger

The Foreign Function & Memory API (also called Project Panama) has come a long way since it started. You can find the latest version implemented in JDK 21 as a preview feature (use --enable-preview to enable it) which is specified by the JEP 454:

By efficiently invoking foreign functions (i.e., code outside the JVM), and by safely accessing foreign memory (i.e., memory not managed by the JVM), the API enables Java programs to call native libraries and process native data without the brittleness and danger of JNI.
JEP 454

This is pretty helpful when trying to build wrappers around existing native libraries. Other languages, like Python with ctypes, have had this for a long time, but Java is getting a proper API for native interop, too. Of course, there is the Java Native Interface (JNI), but JNI is cumbersome and inefficient (call-sites aren’t inlined, and the overhead of converting data from Java to the native world and back is huge).

Be aware that the API is still in flux. Much of the existing non-OpenJDK documentation is not in sync.

Example

Now to my main example: Assume you’re tired of all the abstraction of the Java I/O API and just want to read a file using the traditional I/O functions of the C standard lib (like read_line.c): we’re trying to read the first line of the passed file, opening the file via fopen, reading the first line via gets, and closing the file via fclose.

#include "stdio.h"
#include "stdlib.h"

int main(int argc, char *argv[]) {
  FILE* file = fopen(argv[1], "r");
  char* line = malloc(1024);
  fgets(line, 1024, file);
  printf("%s", line);
  fclose(file);
  free(line);
}

This would have involved writing C code in the old JNI days, but we can access the required C functions directly with Panama, wrapping the C functions and writing the C program as follows in Java:

public static void main(String[] args) {
    var file = fopen(args[0], "r");
    var line = gets(file, 1024);
    System.out.println(line);
    fclose(file);
}

But do we implement the wrapper methods? We start with the FILE* fopen(char* file, char* mode) function which opens a file. Before we can call it, we have to get hold of its MethodHandle:

private static MethodHandle fopen = Linker.nativeLinker().downcallHandle(
        lookup("fopen"),
        FunctionDescriptor.of(/* return */ ValueLayout.ADDRESS, 
            /* char* file */ ValueLayout.ADDRESS, 
            /* char* mode */ ValueLayout.ADDRESS));

This looks up the fopen symbol in all the libraries that the current process has loaded, asking both the NativeLinker and the SymbolLookup. This code is used in many examples, so we move it into the function lookup:

public static MemorySegment lookup(String symbol) {
    return Linker.nativeLinker().defaultLookup().find(symbol)
                 .or(() -> SymbolLookup.loaderLookup().find(symbol))
                 .orElseThrow();
}

The look-up returns the memory address at which the looked-up function is located.

We can proceed with the address of fopen and use it to create a MethodHandle that calls down from the JVM into native code. For this, we also have to specify the descriptor of the function so that the JVM knows how to call the fopen handle properly.

But how do we use this handle? Every handle has an invokeExact function (and an invoke function that allows the JVM to convert data) that we can use. The only problem is that we want to pass strings to the fopen call. We cannot pass the strings directly but instead have to allocate them onto the C heap, copying the chars into a C string:

public static MemorySegment fopen(String filename, String mode) {
    try (var arena = Arena.ofConfined()) {
        return (MemorySegment) fopen.invokeExact(
                arena.allocateUtf8String(filename),
                arena.allocateUtf8String(mode));
    } catch (Throwable t) {
        throw new RuntimeException(t);
    }
}

In JDK 22 allocateUtf8String changes to allocateFrom (thanks Brice Dutheil for spotting this).

We use a confined arena for allocations, which is cleaned after exiting the try-catch. The newly allocated strings are then used to invoke fopen, letting us return the FILE*.

Older tutorials might mention MemorySessions, but they are removed in JDK 21.

After opening the file, we can focus on the char* fgets(char* buffer, int size, FILE* file) function. This function is passed a buffer of a given size, storing the next line from the passed file in the buffer.

Getting a MethodHandle is similar to fopen:

private static MethodHandle fgets = Linker.nativeLinker().downcallHandle(
        PanamaUtil.lookup("fgets"),
        FunctionDescriptor.of(ValueLayout.ADDRESS, 
                              ValueLayout.ADDRESS, 
                              ValueLayout.JAVA_INT, 
                              ValueLayout.ADDRESS));

Only the wrapper method differs because we have to allocate the buffer in the arena:

public static String gets(MemorySegment file, int size) {
    try (var arena = Arena.ofConfined()) {
        var buffer = arena.allocateArray(ValueLayout.JAVA_BYTE, size);
        var ret = (MemorySegment) fgets.invokeExact(buffer, size, file);
        if (ret == MemorySegment.NULL) {
            return null; // error
        }
        return buffer.getUtf8String(0);
    } catch (Throwable t) {
        throw new RuntimeException(t);
    }
}

Finally, we can implement the int fclose(FILE* file) function to close the file:

private static MethodHandle fclose = Linker.nativeLinker().downcallHandle(
        PanamaUtil.lookup("fclose"),
        FunctionDescriptor.of(ValueLayout.JAVA_INT, ValueLayout.ADDRESS));

public static int fclose(MemorySegment file) {
    try {
        return (int) fclose.invokeExact(file);
    } catch (Throwable e) {
        throw new RuntimeException(e);
    }
}

You can find the source code in my panama-examples repository on GitHub (file HelloWorld.java) and run it on a Linux x86_64 machine via

> ./run.sh HelloWorld LICENSE # build and run
                                 Apache License

which prints the first line of the license file.

Errno

We didn’t care much about error handling here, but sometimes, we want to know precisely why a C function failed. Luckily, the C standard library on Linux and other Unixes has errno:

Several standard library functions indicate errors by writing positive integers to errno.
CPP Reference

On error, fopen returns a null pointer and sets errno. You can find information on all the possible error numbers on the man page for the open function.

We only have to have a way to obtain the errno directly after a call, we have to capture the call state and declare the capture-call-state option in the creation of the MethodHandle for fopen:

try (var arena = Arena.ofConfined()) {
    // declare the errno as state to be captured, 
    // directly after the downcall without any interence of the
    // JVM runtime
    StructLayout capturedStateLayout = Linker.Option.captureStateLayout();
    VarHandle errnoHandle = 
        capturedStateLayout.varHandle(
            MemoryLayout.PathElement.groupElement("errno"));
    Linker.Option ccs = Linker.Option.captureCallState("errno");

    MethodHandle fopen = Linker.nativeLinker().downcallHandle(
            lookup("fopen"), 
            FunctionDescriptor.of(POINTER, POINTER, POINTER), 
            ccs);

    MemorySegment capturedState = arena.allocate(capturedStateLayout);
    try {
        // reading a non-existent file, this will set the errno
        MemorySegment result = 
            (MemorySegment) fopen.invoke(capturedState,
                // for our example we pick a file that doesn't exist
                // this ensures a proper error number
                arena.allocateUtf8String("nonexistent_file"),
                arena.allocateUtf8String("r"));
        int errno = (int) errnoHandle.get(capturedState);
        System.out.println(errno);
        return result;
    } catch (Throwable e) {
        throw new RuntimeException(e);
    }
}

To convert this error number into a string, we can use the char* strerror(int errno) function:

// returned char* require this specific type
static AddressLayout POINTER = 
    ValueLayout.ADDRESS.withTargetLayout(
        MemoryLayout.sequenceLayout(JAVA_BYTE));
static MethodHandle strerror = Linker.nativeLinker()
        .downcallHandle(lookup("strerror"),
                FunctionDescriptor.of(POINTER, 
                    ValueLayout.JAVA_INT));

static String errnoString(int errno){
    try {
        MemorySegment str = 
            (MemorySegment) strerror.invokeExact(errno);
        return str.getUtf8String(0);
    } catch (Throwable t) {
        throw new RuntimeException(t);
    }
}

When we then print the error string in our example after the fopen call, we get:

No such file or directory

This is as expected, as we hard-coded a non-existent file in the fopen call.

JExtract

Creating all the MethodHandles manually can be pretty tedious and error-prone. JExtract can parse header files, generating MethodHandles and more automatically. You can download jextract on the project page.

For our example, I wrote a small wrapper around jextract that automatically downloads the latest version and calls it on the misc/headers.h file to create MethodHandles in the class Lib. The headers file includes all the necessary headers to run examples:

#include <errno.h>
#include <string.h>
#include <unistd.h>
#include <stdio.h>

For example the fgets function, jextract generates as an entry point the following:

public static MethodHandle fopen$MH() {
    return RuntimeHelper.requireNonNull(constants$48.const$0,"fopen");
}
/**
 * {@snippet :
 * FILE* fopen(char* __filename, char* __modes);
 * }
 */
public static MemorySegment fopen(MemorySegment __filename, MemorySegment __modes) {
    var mh$ = fopen$MH();
    try {
        return (java.lang.foreign.MemorySegment)mh$.invokeExact(__filename, __modes);
    } catch (Throwable ex$) {
        throw new AssertionError("should not reach here", ex$);
    }
}

Of course, we still have to take care of the string allocation in our wrapper, but this wrapper gets significantly smaller:

public static MemorySegment fopen(String filename, String mode) {
    try (var arena = Arena.ofConfined()) {
        // using the MethodHandle that has been generated 
        // by jextract
        return Lib.fopen( 
                arena.allocateUtf8String(filename),
                arena.allocateUtf8String(mode));
    }
}

You can find the example code in the GitHub repository in the file HelloWorldJExtract.java. I integrated jextract via a wrapper directly into the Maven build process, so just mvn package to run the tool.

More Information

There are many other resources on Project Panama, but be aware that they might be dated. Therefore, I recommend reading JEP 454, which describes the newly introduced API in great detail. Additionally, the talk “The Panama Dojo: Black Belt Programming with Java 21 and the FFM API” by Per Minborg at this year’s Devoxx Belgium is a great introduction:

As well as the talk by Maurizio Cimadamore at this year’s JVMLS:

Conclusion

Project Panama greatly simplifies interfacing with existing native libraries. I hope it will gain traction after leaving the preview state with the upcoming JDK 22, but it should already be stable enough for small experiments and side projects.

I hope my introduction gave you a glimpse into Panama; as always, I’m happy for any comments, and I’ll see you next week(ish) for the start of a new blog series.

This article is part of my work in the SapMachine team at SAP, making profiling and debugging easier for everyone. Thank you to my colleague Martin Dörr, who helped me with Panama and ported Panama to PowerPC.

Profiling Maven Projects with my IntelliJ Profiler Plugin

Posted on December 4, 2023 by Johannes Bechberger

Or: I just released version 0.0.11 with a cool new feature that I can’t wait to tell you about…

According to the recent JetBrains survey, most people use Maven as their build system and build Spring Boot applications with Java. Yet my profiling plugin for IntelliJ only supports profiling pure Java run configuration. Configurations where the JVM gets passed the main class to run. This is great for tiny examples where you directly right-click on the main method and profile the whole application using the context menu:

But this is not great when you’re using the Maven build system and usually run your application using the exec goal, or, god forbid, use Spring Boot or Quarkus-related goals. Support for these goals has been requested multiple times, and last week, I came around to implementing it (while also two other bugs). So now you can profile your Spring Boot, like the Spring pet-clinic, application running with spring-boot:run:

Giving you a profile like:

Or your Quarkus application running with quarkus:dev:

Giving you a profile like:

This works specifically by using the options of these goals, which allows the profiler plugin to pass profiling-specific JVM options. If the plugin doesn’t detect a directly supported plugin, it passes the JVM options via the MAVEN_OPTS environment variable. This should work with the exec goals and others.

Gradle script support has also been requested, but despite searching the whole internet till the night, I didn’t find any way to add JVM options to the JVM that Gradle runs for the Spring Boot or run tasks without modifying the build.gradle file itself (see Baeldung).

I left when it was dark and rode out into the night with my bike. Visiting other lost souls in the pursuit of sweet potato curry.

Only Quarku’s quarkusDev task has the proper options so that I can pass the JVM options. So, for now, I only have basic Quarkus support but nothing else. Maybe one of my readers knows how I could still provide profiling support for non-Quarkus projects.

You can configure the options that the plugin uses for specific task prefixes yourself in the .profileconfig.json file:

{
    "additionalGradleTargets": [
        {
            // example for Quarkus
            "targetPrefix": "quarkus",
            "optionForVmArgs": "-Djvm.args",
            "description": "Example quarkus config, adding profiling arguments via -Djvm.args option to the Gradle task run"
        }
    ],
    "additionalMavenTargets": [
        {   // example for Quarkus
            "targetPrefix": "quarkus:",
            "optionForVmArgs": "-Djvm.args",
            "description": "Example quarkus config, adding profiling arguments via -Djvm.args option to the Maven goal run"
        }
    ]
}

This update has been the first one with new features since April. The new features should make life easier for profiling both real-world and toy applications. If you have any other feature requests, feel free to create an issue on GitHub and, ideally, try to create a pull request. I’m happy to help you get started.

See you next week on some topics I have not yet decided on. I have far more ideas than time…

This article is part of my work in the SapMachine team at SAP, making profiling and debugging easier for everyone. Thanks to the issue reporters and all the other people who tried my plugin.

Finding all used Classes, Methods and Functions of a Python Module

Posted on December 1, 2023 by Johannes Bechberger

Another blog post in which I use sys.settrace. This time to solve a real problem.

When working with new modules, it is sometimes beneficial to get a glimpse of which entities of a module are actually used. I wrote something comparable in my blog post Instrumenting Java Code to Find and Handle Unused Classes, but this time, I need it in Python and with method-level granularity.

TL;DR

Download trace.py from GitHub and use it to print a call tree and a list of used methods and classes to the error output:

import trace
trace.setup(r"MODULE_REGEX", print_location_=True)

Implementation

This could be a hard problem, but it isn’t when we’re using sys.settrace to set a handler for every method and function call, reapplying the knowledge we gained in my Let’s create a debugger together series to develop a small utility.

There are essentially six different types of functions (this sample code is on GitHub):

def log(message: str):
    print(message)


class TestClass:
    # static initializer of the class
    x = 100

    def __init__(self):
        # constructor
        log("instance initializer")

    def instance_method(self):
        # instance method, self is bound to an instance
        log("instance method")

    @staticmethod
    def static_method():
        log("static method")

    @classmethod
    def class_method(cls):
        log("class method")


def free_function():
    log("free function")

This is important because we have to handle them differently in the following. But first, let’s define a few helpers and configuration variables:

indent = 0
module_matcher: str = ".*"
print_location: bool = False

We also want to print a method call-tree, so we use indent to track the current indentation level. The module_matcher is the regular expression that we use to determine whether we want to consider a module, its classes, and methods. This could, e.g., be __main__ to only consider the main module. The print_location tells us whether we want to print the path and line location for every element in the call tree.

Now to the main helper class:

def log(message: str):
    print(message, file=sys.stderr)


STATIC_INIT = "<static init>"

@dataclass
class ClassInfo:
    """ Used methods of a class """
    name: str
    used_methods: Set[str] = field(default_factory=set)

    def print(self, indent_: str):
        log(indent_ + self.name)
        for method in sorted(self.used_methods):
            log(indent_ + "  " + method)

    def has_only_static_init(self) -> bool:
        return (
                    len(self.used_methods) == 1 and
                    self.used_methods.pop() == STATIC_INIT)

used_classes: Dict[str, ClassInfo] = {}
free_functions: Set[str] = set()

The ClassInfo stores the used methods of a class. We store the ClassInfo instances of used classes and the free function in global variables.

Now to the our call handler that we pass to sys.settrace:

def handler(frame: FrameType, event: str, *args):
    """ Trace handler that prints and tracks called functions """
    # find module name
    module_name: str = mod.__name__ if (
        mod := inspect.getmodule(frame.f_code)) else ""

    # get name of the code object
    func_name = frame.f_code.co_name

    # check that the module matches the define regexp
    if not re.match(module_matcher, module_name):
        return
    
    # keep indent in sync
    # this is the only reason why we need
    # the return events and use an inner trace handler
    global indent
    if event == 'return':
        indent -= 2
        return
    if event != "call":
        return

    # insert the current function/method
    name = insert_class_or_function(module_name, func_name, frame)

    # print the current location if neccessary
    if print_location:
        do_print_location(frame)
    
    # print the current function/method
    log(" " * indent + name)

    # keep the indent in sync
    indent += 2

    # return this as the inner handler to get
    # return events
    return handler


def setup(module_matcher_: str = ".*", print_location_: bool = False):
    # ...
    sys.settrace(handler)

Now, we “only” have to get the name for the code object and collect it properly in either a ClassInfo instance or the set of free functions. The base case is easy: When the current frame contains a local variable self, we probably have an instance method, and when it contains a cls variable, we have a class method.

def insert_class_or_function(module_name: str, func_name: str,
                             frame: FrameType) -> str:
    """ Insert the code object and return the name to print """
    if "self" in frame.f_locals or "cls" in frame.f_locals:
        return insert_class_or_instance_function(module_name,
                                                 func_name, frame)
   # ...

def insert_class_or_instance_function(module_name: str,
                                      func_name: str,
                                      frame: FrameType) -> str:
    """
    Insert the code object of an instance or class function and
    return the name to print
    """
    class_name = ""

    if "self" in frame.f_locals:
        # instance methods
        class_name = frame.f_locals["self"].__class__.__name__

    elif "cls" in frame.f_locals:
        # class method
        class_name = frame.f_locals["cls"].__name__
        # we prefix the class method name with "<class>"
        func_name = "<class>" + func_name
    
    # add the module name to class name
    class_name = module_name + "." + class_name
    get_class_info(class_name).used_methods.add(func_name)
    used_classes[class_name].used_methods.add(func_name)
    
    # return the string to print in the class tree
    return class_name + "." + func_name

But how about the other three cases? We use the header line of a method to distinguish between them:

class StaticFunctionType(Enum):
    INIT = 1
    """ static init """
    STATIC = 2
    """ static function """
    FREE = 3
    """ free function, not related to a class """


def get_static_type(code: CodeType) -> StaticFunctionType:
    file_lines = Path(code.co_filename).read_text().split("\n")
    line = code.co_firstlineno
    header_line = file_lines[line - 1]
    if "class " in header_line:
        # e.g. "class TestClass"
        return StaticFunctionType.INIT
    if "@staticmethod" in header_line:
        return StaticFunctionType.STATIC
    return StaticFunctionType.FREE

These are, of course, just approximations, but they work well enough for a small utility used for exploration.

If you know any other way that doesn’t involve using the Python AST, feel free to post in a comment below.

Using the get_static_type function, we can now finish the insert_class_or_function function:

def insert_class_or_function(module_name: str, func_name: str,
                             frame: FrameType) -> str:
    """ Insert the code object and return the name to print """
    if "self" in frame.f_locals or "cls" in frame.f_locals:
        return insert_class_or_instance_function(module_name,
                                                 func_name, frame)
    # get the type of the current code object
    t = get_static_type(frame.f_code)

    if t == StaticFunctionType.INIT:
        # static initializer, the top level class code
        # func_name is actually the class name here,
        # but classes are technically also callable function
        # objects
        class_name = module_name + "." + func_name
        get_class_info(class_name).used_methods.add(STATIC_INIT)
        return class_name + "." + STATIC_INIT
    
    elif t == StaticFunctionType.STATIC:
        # @staticmethod
        # the qualname is in our example TestClass.static_method,
        # so we have to drop the last part of the name to get
        # the class name
        class_name = module_name + "." + frame.f_code.co_qualname[
                                         :-len(func_name) - 1]
        # we prefix static class names with "<static>"
        func_name = "<static>" + func_name
        get_class_info(class_name).used_methods.add(func_name)
        return class_name + "." + func_name
 
    free_functions.add(frame.f_code.co_name)
    return module_name + "." + func_name

The final thing left to do is to register a teardown handler to print the collected information on exit:

def teardown():
    """ Teardown the tracer and print the results """
    sys.settrace(None)
    log("********** Trace Results **********")
    print_info()


# trigger teardown on exit
atexit.register(teardown)

Usage

We now prefix our sample program from the beginning with

import trace

trace.setup(r"__main__")

collect all information for the __main__ module, which is directly passed to the Python interpreter.

We append to our program some code to call all methods/functions:

def all_methods():
    log("all methods")
    TestClass().instance_method()
    TestClass.static_method()
    TestClass.class_method()
    free_function()


all_methods()

Our utility library then prints the following upon execution:

standard error:

    __main__.TestClass.<static init>
    __main__.all_methods
      __main__.log
      __main__.TestClass.__init__
        __main__.log
      __main__.TestClass.instance_method
        __main__.log
      __main__.TestClass.<static>static_method
        __main__.log
      __main__.TestClass.<class>class_method
        __main__.log
      __main__.free_function
        __main__.log
    ********** Trace Results **********
    Used classes:
      only static init:
      not only static init:
       __main__.TestClass
         <class>class_method
         <static init>
         <static>static_method
         __init__
         instance_method
    Free functions:
      all_methods
      free_function
      log

standard output:

    all methods
    instance initializer
    instance method
    static method
    class method
    free function

Conclusion

This small utility uses the power of sys.settrace (and some string processing) to find a module’s used classes, methods, and functions and the call tree. The utility is pretty helpful when trying to grasp the inner structure of a module and the module entities used transitively by your own application code.

I published this code under the MIT license on GitHub, so feel free to improve, extend, and modify it. Come back in a few weeks to see why I actually developed this utility…

This article is part of my work in the SapMachine team at SAP, making profiling and debugging easier for everyone.

Let’s create a Python Debugger together: PyData Talk

Posted on November 17, 2023 by Johannes Bechberger

A small addendum to the previous five parts of my journey down the Python debugger rabbit hole (part 1, part 2, part 3, part 4, and part 5).

I gave a talk on this topic, based on my blog posts, at PyData Karlsruhe:

You can find all the source code of the demos here. It was a great pleasure giving this talk, and the audience received it well.

This might be the end of my journey into Python debuggers, but I feel some untold topics are out there. So, if you have any ideas, feel free to comment. See you in my next blog post and possibly at the next Python conference that accepts my talk proposal.

The presentation was part of my work in the SapMachine team at SAP, making profiling and debugging easier for everyone.

Let’s create a Python Debugger together: Tiny Addendum (exec and name)

Posted on November 14, 2023 by Johannes Bechberger

A small addendum to the previous four parts of my journey down the Python debugger rabbit hole (part 1, part 2, part 3, and part 4).

I tried the debugger I finished last week on a small sample application for my upcoming talk at the PyData Südwest meet-up, and it failed. The problem is related to running the file passed to the debugger. Consider that we debug the following program:

def main():
    print("Hi")

if __name__ == "__main__":
    main()

We now set the breakpoint in the main method when starting the debugger and continuing with the execution of the program. The problem: It never hits the breakpoint. But why? Because it never calls the main method.

The cause of this problem is that the __name__ variable is set to dbg2.py (the file containing the code is compiling and running the script). But how do we run the script? We use the following (based on a Real Python article):

_globals = globals().copy()

# ...

class Dbg:
  
   # ...   
    
   def run(self, file: Path):
        """ Run a given file with the debugger """
        self._main_file = file
        # see https://realpython.com/python-exec/#using-python-for-configuration-files
        compiled = compile(file.read_text(), filename=str(file), mode='exec')
        sys.argv.pop(0)
        sys.breakpointhook = self._breakpoint
        self._process_compiled_code(compiled)
        exec(compiled, _globals)

This code uses the compile method to compile the code, telling this method that the file belongs to the program file.

The mode argument specifies what kind of code must be compiled; it can be 'exec' if source consists of a sequence of statements, 'eval' if it consists of a single expression, or 'single' if it consists of a single interactive statement (in the latter case, expression statements that evaluate to something other than None will be printed).
Python Documentation for The Mode Argument of the Compile Method

We then remove the first argument of the program because it is the debugged file in the case of the debugger and run some post-processing on the compiled code object. This is the reason why we can’t just use eval. Finally, we use exec to execute the compiled code with the global variables that we had before creating the Dbg class and others.

The problem is that exec it doesn’t set the import-related module attributes, such as __name__ and __file__ properly. So we have to emulate these by adding global variables:

        exec(compiled, _globals | 
                      {"__name__": "__main__", "__file__": str(file)})

It makes of course sense that exec behaves this way, as it is normally used to evaluate code in the current context.

With this now fixed, it is possible to debug normal applications like the line counter that I use in my upcoming talk at the 16th November in Karlsruhe.

I hope you liked this short addendum and see you next time with a blog post on something more Java-related.

Let’s create a Python Debugger together: Part 4 (Python 3.12 edition)

Posted on November 10, 2023 by Johannes Bechberger

The fourth part of my journey down the Python debugger rabbit hole (part 1, part 2, and part 3).

In this article, we’ll be looking into how changes introduced in Python 3.12 can help us with one of the most significant pain points of our current debugger implementation: The Python interpreter essentially calls our callback at every line of code, regardless if we have a breakpoint in the currently running method. But why is this the case?

Continue reading →

Let’s create a Python Debugger together: Part 3 (Refactoring)

Posted on November 6, 2023 by Johannes Bechberger

This is the necessary third part of my journey down the Python debugger rabbit hole; if you’re new to the series, please take a look at part 1 and part 2 first.

I promised in the last part of this series that I’ll show you how to use the new Python APIs. However, some code refactoring is necessary before I can finally proceed. The implementation in dbg.py mixes the sys.settrace related code and code that can be reused for other debugging implementations. So, this is a short blog post covering the result of the refactoring. The code can be found in dbg2 .py.

Continue reading →

Loom is just HyperThreading in Java

Posted on October 23, 2023 by Johannes Bechberger

While sitting in Cay Horstmann‘s “Looming Changes in Java Concurrency” talk at BaselOne, I had an epiphany: Aren’t virtual threads with Loom just a version of HyperThreading on the JVM?

Both try to utilize a computation resource fully, be it hardware core or platform thread, by multiplexing multiple tasks onto it, despite many tasks waiting regularly for IO operations to complete:

When one task waits, another can be scheduled, improving overall throughput. This works especially well when longer IO operations follow short bursts of computation.

There are, of course, differences between the two, most notably: HyperThreading doesn’t need the tasks to cooperate, as Loom does, so a virtual core can’t starve other virtual cores. Also noteworthy is that the scheduler for Hyper-Threading is implemented in silicon and cannot be configured or even changed, while the virtual thread execution can be targeted to one’s needs.

I hope you found this small insight helpful in understanding virtual threads and putting them into context. You can find more about these topics in resources like JEP 444 (Virtual Threads) and the “Hyper-Threading Technology Architecture and Microarchitecture” paper.

This article is part of my work in the SapMachine team at SAP, making profiling and debugging easier for everyone.

Putting JFR into Context

Posted on October 19, 2023 by Johannes Bechberger

Have you ever wanted to bring your JFR events into context? Adding information on sessions, user IDs, and more can improve your ability to make sense of all the events in your profile. Currently, we can only add context by creating custom JFR events, as I presented in my Profiling Talks:

We can use these custom events (see Custom JFR Events: A Short Introduction and Custom Events in the Blocky World: Using JFR in Minecraft) to store away the information and later relate them to all the other events by using the event’s time, duration, and thread. This works out-of-the-box but has one major problem: Relating events is quite fuzzy, as time stamps are not as accurate (see JFR Timestamps and System.nanoTime), and we do all of this in post-processing.

But couldn’t we just attach some context to every JFR event we’re interested in? Not yet, but Jaroslav Bachorik from DataDog is working on it. Recently, he wrote three blog posts (1, 2, 3). The following is a different take on his idea, showing how to use it in a small file server example.

The main idea of Jaroslav’s approach is to store a context in thread-local memory and attach it to every JFR event as configured. But before I dive into the custom context, I want to show you the example program, which you can find, as always, MIT-licensed on GitHub.

Example

We create a simple file server via Javalin, which allows a user to

Register (URL schema register/{user})
Store data in a file (store/{user}/{file}/{content})
Retrieve file content (load/{user}/{file})
Delete files (delete/{user}/{file})

The URLs are simple to use, and we don’t bother about error handling, user authentication, or large files, as this would complicate our example. I leave it as an exercise for the inclined reader. The following is the most essential part of the application: the server declaration:

FileStorage storage = new FileStorage();                                                               
try (Javalin lin = Javalin.create(conf -> {                                                            
            conf.jetty.server(() ->                                                                    
                    new Server(new QueuedThreadPool(4))                                                
            );                                                                                         
        })                                                                                             
        .exception(Exception.class, (e, ctx) -> {                                                      
            ctx.status(500);                                                                           
            ctx.result("Error: " + e.getMessage());                                                    
            e.printStackTrace();                                                                       
        })                                                                                             
        .get("/register/{user}", ctx -> {                                                              
            String user = ctx.pathParam("user");                                                       
            storage.register(user);                                                                    
            ctx.result("registered");                                                                  
        })                                                                                             
        .get("/store/{user}/{file}/{content}", ctx -> {                                                
            String user = ctx.pathParam("user");                                                       
            String file = ctx.pathParam("file");                                                       
            storage.store(user, file, ctx.pathParam("content"));                                       
            ctx.result("stored");                                                                      
        })                                                                                             
        .get("/load/{user}/{file}", ctx -> {                                                           
            String user = ctx.pathParam("user");                                                       
            String file = ctx.pathParam("file");                                                       
            ctx.result(storage.load(user, file));                                                      
        })                                                                                             
        .get("/delete/{user}/{file}", ctx -> {                                                         
            String user = ctx.pathParam("user");                                                       
            String file = ctx.pathParam("file");                                                       
            storage.delete(user, file);                                                                
            ctx.result("deleted");                                                                     
        })) {                                                                                          
    lin.start(port);                                                                                   
    Thread.sleep(100000000);                                                                           
} catch (InterruptedException ignored) {                                                               
}

This example runs on Jaroslav’s OpenJDK fork (commit 6ea2b4f), so if you want to run it in its complete form, please build the fork and make sure that you’re PATH and JAVA_HOME environment variables are set accordingly.

You can build the server using mvn package and
start it, listening on the port 1000, via:

java -jar target/jfr-context-example.jar 1000

You can then use it via your browser or curl:

# start the server
java -XX:StartFlightRecording=filename=flight.jfr,settings=config.jfc \
-jar target/jfr-context-example.jar 1000 &
pid=$!

# register a user
curl http://localhost:1000/register/moe

# store a file
curl http://localhost:1000/store/moe/hello_file/Hello

# load the file
curl http://localhost:1000/load/moe/hello_file
-> Hello

# delete the file
curl http://localhost:1000/delete/moe/hello_file

kill $pid

# this results in the flight.jfr file

To make testing easier, I created the test.sh script, which starts the server, registers a few users and stores, loads, and deletes a few files, creating a JFR file along the way. We're using a custom JFR configuration to enable the IO events without any threshold. This is not recommended for production but is required in our toy example to get any such event:

<?xml version="1.0" encoding="UTF-8"?>

<configuration version="2.0" label="Custom" description="Custom config for the example"
  provider="Johannes Bechberger">
    <event name="jdk.FileRead" withContext="true">
        <setting name="enabled">true</setting>
        <setting name="stackTrace">true</setting>
        <setting name="threshold" control="file-threshold">0 ms</setting>
    </event>

    <event name="jdk.FileWrite" withContext="true">
        <setting name="enabled">true</setting>
        <setting name="stackTrace">true</setting>
        <setting name="threshold" control="file-threshold">0 ms</setting>
    </event>
</configuration>

We can use the jfr tool to easily print all the jdk.FileRead events from the created flight.jfr file in JSON format:

jfr print --events jdk.FileRead --json flight.jfr

This prints a list of events like:

{
  "type": "jdk.FileRead", 
  "values": {
    "startTime": "2023-10-18T14:31:56.369071625+02:00", 
    "duration": "PT0.000013042S", 
    "eventThread": {
      "osName": "qtp2119992687-32", 
      ...
    }, 
    "stackTrace": {
      "truncated": false, 
      "frames": [...]
    }, 
    "path": "\/var\/folders\/nd\/b8fyk_lx25b1ndyj4kmb2hk403cmxz\/T\/tmp13266469351066000997\/moe\/test_1", 
    "bytesRead": 8, 
    "endOfFile": false
  }
}

You can find more information on this and other events in my JFR Event Collection:

There are, of course, other events, but in our file server example, we’re only interested in file events for now (this might change as Jaroslav adds more features to his fork).

Now, we can start bringing the events into context.

Adding Custom Context

Before we can add the context, we have to define it, as described in Jaroslav’s blog post. We create a context that stores the current user, action, trace ID, and optional file:

@Name("tracer-context")
@Description("Tracer context type tuple")
public class TracerContextType extends ContextType implements AutoCloseable {

    private static final AtomicLong traceIdCounter = new AtomicLong(0);

    // attributes are defined as plain public fields annotated by at least @Name annotation
    @Name("user")
    @Description("Registered user")
    public String user;

    @Name("action")
    @Description("Action: register, store, load, delete")
    public String action;

    @Name("file")
    @Description("File if passed")
    public String file;

    // currently no primitives allowed here
    @Name("trace")
    public String traceId;

    public TracerContextType(String user, String action, String file) {
        this.user = user;
        this.action = action;
        this.file = file;
        this.traceId = "" + traceIdCounter.incrementAndGet();
        this.set();
    }

    public TracerContextType(String user, String action) {
        this(user, action,"");
    }

    @Override
    public void close() throws Exception {
        unset();
    }
}

A context has to be set and then later unset, which can be cumbersome in the face of exceptions. Implementing the AutoClosable interface solves this by allowing us to wrap code in a try-with-resources statement:

try (var t = new TracerContextType(/* ... */)) {
    // ...
}

All JFR events with enabled context that happen in the body of the statement are associated with the TracerContextType instance. We can use the code of all request handlers in our server with such a construct, e.g.:

.get("/store/{user}/{file}/{content}", ctx -> {                 
    String user = ctx.pathParam("user");                        
    String file = ctx.pathParam("file");                        
    try (var t = new TracerContextType(user, "store", file)) {  
        storage.store(user, file, ctx.pathParam("content"));    
        ctx.result("stored");                                   
    }                                                           
})

One last thing before we can analyze the annotated events: JFR has to know about your context before the recording starts. We do this by creating a registration class registered as a service.

@AutoService(ContextType.Registration.class)
public class TraceContextTypeRegistration implements ContextType.Registration {

    @Override
    public Stream<Class<? extends ContextType>> types() {
        return Stream.of(TracerContextType.class);
    }
}

We use the auto-service project by Google to automatically create the required build files (read more in this blog post by Pedro Rijo.

Using the Custom Context

After adding the context, we can see it in the jdk.FileRead events:

{
  "type": "jdk.FileRead", 
  "values": {
    "startTime": "2023-10-18T14:31:56.369071625+02:00", 
    "duration": "PT0.000013042S", 
    "eventThread": {
      "osName": "qtp2119992687-32", 
      ...
    }, 
    "stackTrace": {
      "truncated": false, 
      "frames": [...]
    }, 
    "tracer-context_user": "moe", 
    "tracer-context_action": "load", 
    "tracer-context_file": "test_1", 
    "tracer-context_trace": "114", 
    "path": "\/var\/folders\/nd\/b8fyk_lx25b1ndyj4kmb2hk403cmxz\/T\/tmp13266469351066000997\/moe\/test_1", 
    "bytesRead": 8, 
    "endOfFile": false
  }
}

We clearly see the stored context information (tracer-context_*).

Using the jq tool, we can analyze the events, like calculating how many bytes the server has read for each user:

➜ jfr print --events jdk.FileRead --json flight.jfr |
  jq -r '
    .recording.events
    | group_by(.values."tracer-context_user")
    | map({
      user: .[0].values."tracer-context_user",
      bytesRead: (map(.values.bytesRead) | add)
    })
   | map([.user, .bytesRead])
   | ["User", "Bytes Read"]
   , .[]
   | @tsv
 '
User    Bytes Read
        3390101
bob     80
curly   100
frank   100
joe     80
john    90
larry   100
mary    90
moe     80
sally   100
sue     80

The empty user is for all the bytes read unrelated to any specific user (like class files), which is quite helpful.

Conclusion

This small example is just a glimpse of what is possible with JFR contexts. Jaroslav’s prototypical implementation is still limited; it, e.g., doesn’t support contexts at method sampling events, but it is already a significant improvement over the status quo. I’ll be creating follow-up blog posts as the prototype evolves and matures.

Thanks for coming so far, and see you next week for another blog post and maybe at a meet-up or conference (see Talks).

This article is part of my work in the SapMachine team at SAP, making profiling and debugging easier for everyone.

Let’s create a Python Debugger together: Part 2

Posted on October 6, 2023 by Johannes Bechberger

The second part of my journey down the Python debugger rabbit hole.

In this blog post, we extend and fix the debugger we created in part 1: We add the capability to

single step over code, stepping over lines, into function calls, and out of functions,
and adding conditions to breakpoints.

You can find the resulting MIT-licensed code on GitHub in the python-dbg repository in the file dbg.py. I added a README so you can glance at how to use it.

Continue reading →