Hello eBPF: Writing a Lottery Scheduler in Pure Java with bpf_for_each Support (18)

Welcome back to my hello-ebpf series. Last week, we learned about lottery schedulers and how to implement one with sched-ext and hello-ebpf. But there was one major issue when writing the scheduler in Java: hello-ebpf lacked support for the bpf_for_each macro. This is what this weeks short blog post is all about.

To loop over all elements in the scheduling queue, to pick a random one, we had to write inline C code. A simplified version without scheduling priorities looks like the following:

@Override
public void dispatch(int cpu, Ptr<TaskDefinitions.task_struct> prev) {
    String CODE = """
        // pick a random task struct index
        s32 random = bpf_get_prandom_u32() % 
            scx_bpf_dsq_nr_queued(SHARED_DSQ_ID);
        struct task_struct *p = NULL;
        bpf_for_each(scx_dsq, p, SHARED_DSQ_ID, 0) {
            // iterate till we get to this number
            random = random - 1;
            // and try to dispatch it
            if (random <= 0 && 
              tryDispatching(BPF_FOR_EACH_ITER, p, cpu)) {
                return 0;
            }
       };
       return 0;
       """;
}

This surely works, but inline C code is not the most readable and writing it is error-prone. I’ve written on this topic in my post Hello eBPF: Write your eBPF application in Pure Java (12) before. So what was keeping us from supporting the bpf_for_each macro? Support for lambdas as arguments to built-in bpf functions.

Lottery Scheduler in Pure Java

May I introduce you know the support for directly passed lambda expressions: This let’s us define a bpf_for_each_dsq function and to write a simple lottery scheduler in pure Java:

@BPF(license = "GPL")
@Property(name = "sched_name", value = "lottery_stopping_scheduler")
public abstract class LotteryScheduler extends BPFProgram 
  implements Scheduler {

    private static final int SHARED_DSQ_ID = 0;

    @Override
    public int init() {
        return scx_bpf_create_dsq(SHARED_DSQ_ID, -1);
    }

    @Override
    public void enqueue(
      Ptr<TaskDefinitions.task_struct> p, long enq_flags) {
        var sliceLength = ((@Unsigned int) 5_000_000) / 
            scx_bpf_dsq_nr_queued(SHARED_DSQ_ID);
        scx_bpf_dispatch(p, SHARED_DSQ_ID, sliceLength, enq_flags);
    }

    @BPFFunction
    @AlwaysInline
    public boolean tryDispatching(
      Ptr<BpfDefinitions.bpf_iter_scx_dsq> iter, 
      Ptr<TaskDefinitions.task_struct> p, int cpu) {
        // check if the CPU is usable by the task
        if (!bpf_cpumask_test_cpu(cpu, p.val().cpus_ptr)) {
            return false;
        }
        return scx_bpf_dispatch_from_dsq(iter, p, 
          SCX_DSQ_LOCAL_ON.value() | cpu, SCX_ENQ_PREEMPT.value());
    }

    /**
     * Dispatch tasks
     * <p/>
     * Pick a random task from the shared DSQ and 
     * try to dispatch it
     */
    @Override
    public void dispatch(int cpu, 
      Ptr<TaskDefinitions.task_struct> prev) {
        Box<Integer> random = Box.of(bpf_get_prandom_u32() 
                            % scx_bpf_dsq_nr_queued(SHARED_DSQ_ID));
        Ptr<TaskDefinitions.task_struct> p = null;
        bpf_for_each_dsq(SHARED_DSQ_ID, p, iter -> {
            random.set(random.val() - 1);
            if (random.val() <= 0 && tryDispatching(iter, p, cpu)) {
                return;
            }
        });
    }

    public static void main(String[] args) throws Exception {
        try (LotteryScheduler scheduler = 
             BPFProgram.load(LotteryScheduler.class)) {
            scheduler.attachScheduler();
            while (scheduler.isSchedulerAttachedProperly()) {
                Thread.sleep(1000);
            }
        }
    }
}

To run it yourself, clone the hello-ebpf project, follow the setup instructions of the taskcontrol repository (at least a 6.12 kernel) and run the following:

sudo -s PATH=$PATH ./run.sh LotteryScheduler

bpf_for_each_dsq Macro Support

But how is the new bpf_for_each_dsq function, modelling one instance of the bpf_for_each macro modeled? It uses the templates of the @BuiltinBPFFunction annotation and is implemented in the Scheduler class:

/**
 * Iterate over all tasks in the DSQ, using the {@code bpf_for_each} 
 * macro.
 * <p>
 * This inserts the body of the lambda into the macro,
 * so {@code return} works differently,
 * for {@code break;} use {@link BPFJ#_break()} and for 
 * {@code continue;} use {@link BPFJ#_continue()}.
 * Use {@link me.bechberger.ebpf.type.Box} for 
 * using non-final variables from outside the lambda.
 * @param dsq_id queue id
 * @param cur current task in the loop
 * @param body lambda to execute for each task
 */
@BuiltinBPFFunction("""
  bpf_for_each(scx_dsq, $arg2, $arg1, 0) {
      $lambda3:param1:type $lambda3:param1:name = BPF_FOR_EACH_ITER;
      $lambda3:code
  }
  """)
default void bpf_for_each_dsq(int dsq_id,
  Ptr<TaskDefinitions.task_struct> cur,
  Consumer<Ptr<BpfDefinitions.bpf_iter_scx_dsq>> body) {
    throw new MethodIsBPFRelatedFunction();
}

The template might look cryptic, but it uses the newest additions to the template placeholders, introduced in Hello eBPF: Write your eBPF application in Pure Java (12):

$lambdaM:code: the code of the M-th lambda (starting at one), use Box for using variables inside and outside
$lambdaM:paramN: variable declaration for parameter N of the M-th lambda
$lambdaM:paramN:type: type of the parameter N of the M-th lambda
$lambdaM:paramN:name: name of the parameter N of the M-th lambda
for the sake of completeness: $argN: Argument N

So $lambda3:param1:type $lambda3:param1:name is thereby equivalent to “[type of the first parameter of the lambda argument 3 or body consumer] [name of the same parameter]”.

Box Class, Local Variables and Lambdas

But what is the Box class that is mentioned and used with this function? It’s a new built-in wrapper class that solves a fundamental problem with lambdas:

Any local variable, formal parameter, or exception parameter used but not declared in a lambda expression must either be final or effectively final (§4.12.4), as specified in §6.5.6.1.
Section 15.27.2 of the JaVa Language Specification

The reason for this lies in the implementation of lambdas in Java, we can approximate the lambda from the lottery scheduler as follows:

class LotteryLambda implements Consumer<task_struct> {
    private Box<Integer> random;
    private int cpu;
    private Ptr<task_struct> p;
    
    public LotteryLambda(Box<Integer> random, int cpu, 
      Ptr<task_struct> p) {
        this.random = random;
        this.cpu = cpu;
        this.p = p;
    }
    
    @Override
    public void accept(Ptr<BpfDefinitions.bpf_iter_scx_dsq> iter) {
        random.set(random.val() - 1);
        if (random.val() <= 0 && tryDispatching(iter, p, cpu)) {
            return;
        }
    }
}

// lambda application
new LotteryLambda(random, cpu, p).accept(iter)

Altering any value of a LotteryLambda inside the accept method doesn’t change its value outside of the lambda. But boxing the value means that both the code inside and outside the accept method access the same Box object all the time. Because the Box object is effectively final, but its content isn’t.

The rules of the language specification prevent any confusion related to local variables, to make it as consistent as possible.

In hello-ebpf, we take our liberty with semantics, as with the function templates, we just replace $lambdaM:code with the actual body of the lambda, so no such rules are needed in the C code. The Box class alleviates this by wrapping any value (and I used a similar class in many other projects):

@Type
public class Box<T> {

    private T value;

    @BuiltinBPFFunction("$arg1")
    public Box(T value) {
        this.value = value;
    }

    @BuiltinBPFFunction("$arg1")
    public static <T> Box<T> of(T value) {
        return new Box<>(value);
    }

    @BuiltinBPFFunction("$this")
    public T val() {
        return this.value;
    }

    @BuiltinBPFFunction("$this = $arg1")
    public void set(T value) {
        this.value = value;
    }
}

Be aware that the Box is specially handled in the compiler plugin of hello-ebpf, because the @Type annotation does not support template arguments. Any usage of the Box type is just replaced by its containing value and type, when generating C code:

Box<Integer> box = Box.of(42);
int value = box.val();
box.set(43);

Results in the following C code:

s32 box = 42;
s32 value = box;
box = 43;

Be aware that you cannot use the Box outside of method bodies, so don’t pass it directly to methods or store it in fields or global variables.

Conclusion

This blog post showed, how we can implement a scheduler in pure Java, extending the function template mechanism along the way. This shows how powerful the @BuiltinBPFFunction annotation with its templates is and how it can be used to both model built-in functions and macros.

Thank you for joining me on this journey to bring eBPF and sched-ext to the Java world.

My parents 19 year old cat sitting in her basket, while I type this blog post on eBPF.

Author

Johannes Bechberger

Johannes Bechberger is a JVM developer working on profilers and their underlying technology in the SapMachine team at SAP. This includes improvements to async-profiler and its ecosystem, a website to view the different JFR event types, and improvements to the FirefoxProfiler, making it usable in the Java world. He started at SAP in 2022 after two years of research studies at the KIT in the field of Java security analyses. His work today is comprised of many open-source contributions and his blog, where he writes regularly on in-depth profiling and debugging topics, and of working on his JEP Candidate 435 to add a new profiling API to the OpenJDK.
View all posts

New posts like these come out at least every two weeks, to get notified about new posts, follow me on BlueSky, Twitter, Mastodon, or LinkedIn, or join the newsletter:

Mostly nerdless

Every two weeks a text on profiling, debugging or eBPF