Goodbye threads, hello workers

September 22, 2024

Musings

The big picture in JeeH is still not right: I don’t like the way device drivers run in an exception-centric “handler” mode, whereas the rest of the application uses “thread” mode. This distinction was needed to provide atomic guards around certain parts of JeeH’s core data structures. This also affects the app-supplied lowestPower and resumePower functions, used to enter sleep modes “when there is no work”. It’s a bit odd that we need to be in a special mode when … idling!

The current approach is essentially “pessimistic”: always guard and lock, just in case an interrupt might be triggered, to make sure that it stays out of our way. An alternative is to turn things completely around: instead of locking out interrupts by always calling <driver>::finish() from the PendSV exception handler, an IRQ handler could generate and queue a message until there is no risk of interference.

In other words: instead of an application anxiously guarding its data structures, all IRQ handlers will now instead postpone their actions when they detect that the application is not ready for them. This implies that an IRQ handler needs to know when the app is “ready” for it and when not.

Workers #

This is where “workers” come in: a worker is an object which takes care of a specific aspect of processing. This can be managing hardware, such as serial ports, SPI, I2C, etc. or it can be an application-specific activity, such as reading out a sensor, performing periodic checks, background calculations and statistics, or managing some communication protocol.

Workers are implemented as a subclass of JeeH’s Worker type. Everything in an application is implemented as part of some worker and JeeH always tracks which worker is currently running. Workers are created as global objects, with a small amount of code in main() to start things up.

Workers lead to an event-driven style of programming: all workers have a process method which is called whenever a message for that worker needs to be dealt with.

Events #

The messages exchanged between workers and IRQ handlers are much simpler than the current Message in JeeH, and will be called events from now on, also to stress the event-driven approach.

An event is a tiny object which fits in a single 32-bit word. It only contains an 8-bit destination worker ID (eDst), an 8-bit tag (eTag), and a 16-bit payload (eVal). Events can be efficiently passed around by value.

Priorities #

Every worker has a unique ID, which is also its priority. You can only send requests from a lower-priority worker to a higher priority worker, which is why all hardware related workers (which used to be called “device drivers” in JeeH) need to be given the highest IDs.

These requests consist of three parts:

An event which identifies the worker, with a tag to specify what is requested.
An optional event for the worker to send back as completion indicator.
An optional void* pointer supplying additional request details.

When a worker receives a request, its process method is called with the above three arguments:

virtual Event process (Event in, Event out ={}, void* arg =nullptr);

If the work completes right away, the out argument can be returned as result, causing it to be sent as reply event. But workers can also decide to save up these events and use them as replies later. This is what makes requests asynchronous, e.g. timers, external interrupts, communcation protocol reads and writes.

This design removes the distinction between application logic and device drivers: everything is a worker now. All the work is coordinated via requests and replies, based on the new lightweight events.

Interrupts #

Interrupt handlers are much simpler now. They still run in that special “handler” mode, but they only perform the truly time-critical tasks (such as checking and clearing their interrupt status and flags).

Interrupt handlers are implemented as (non-virtual) methods of their corresponding worker. They are tied into the CMSIS IRQ dispatch vector using JeeH’s IRQ_HANDLER macro - here’s how the Ticker object ties into the SysTick interrupt:

struct Ticker : Worker {
    enum TAG { TICK, ... };

    void process (Event in, ...) {
        switch (in.eTag) {
            case TICK: ...
            ...
        }
        ...
    }

    void irqSysTick () {
        ...
        trigger(TICK);
    }
};

Ticker ticker;
IRQ_HANDLER(SysTick, ticker.interrupt)

Not only does this approach add less overhead than JeeH’s previous device driver + interrupt dispatch mechanism, it also unifies the way requests to the ticker are made with how the SysTick IRQ ties into events: both end up getting dispatched through process.

Atomicity #

Apart from interrupts, everything should work as one might expect: requests from worker A to higher-priority worker B will dispatch the work by calling B’s process method, and replies lead to a call to A’s process method once B is done (i.e. returns to its caller).

Events from interrupts are a different story, however: when trigger is called, a new event is saved in a global area, a PendSV is requested, and the interrupt handler returns. If several different IRQs fire at nearly the same time, all their events will be saved. Once there are no more pending interrupts, PendSV takes over and decides what to do, depending on the current worker priority:

If any higher-priority worker has pending events, the current one is suspended (abruptly and pre-emptively) and the new worker is activated to process its pending event(s).
Otherwise, PendSV returns to let the current worker finish its job. Once done, and if there are no more pending events for this worker, JeeH looks for a lower-priority worker with pending events, and activates it. If this was a pre-empted worker, it will resume where it left of.
If there is no more work, the code drops back to main(), where an infinite loop decides which low-power mode to switch to and for how long.

The end result is that every worker can consider all its own methods to be guarded against its own interrupts: if the worker is active, all interrupt events for it will be saved up until it is done. If the triggered worker is not active and of a higher priority than the one currently running, it will suspend that one and immediately be allowed to process the interrupt event.

Status #

Everything described above is preliminary. There’s enough working code at this point to feel confident that it can be made to work, but obviously the proof of the pudding is in the eating … let’s see how it goes!