Goodbye threads, hello workers
September 22, 2024
The big picture in JeeH is still not right: I don’t like the way device drivers
run in an exception-centric “handler” mode, whereas the rest of the application
uses “thread” mode. This distinction was needed to provide atomic guards around
certain parts of JeeH’s core data structures. This also affects the app-supplied
lowestPower
and resumePower
functions, used to enter sleep modes “when there
is no work”. It’s a bit odd that we need to be in a special mode when …
idling!
The current approach is essentially “pessimistic”: always guard and lock, just
in case an interrupt might be triggered, to make sure that it stays out of our
way. An alternative is to turn things completely around: instead of locking out
interrupts by always calling <driver>::finish()
from the PendSV
exception
handler, an IRQ handler could generate and queue a message until there is no
risk of interference.
In other words: instead of an application anxiously guarding its data structures, all IRQ handlers will now instead postpone their actions when they detect that the application is not ready for them. This implies that an IRQ handler needs to know when the app is “ready” for it and when not.
Workers #
This is where “workers” come in: a worker is an object which takes care of a specific aspect of processing. This can be managing hardware, such as serial ports, SPI, I2C, etc. or it can be an application-specific activity, such as reading out a sensor, performing periodic checks, background calculations and statistics, or managing some communication protocol.
Workers are implemented as a subclass of JeeH’s Worker
type. Everything in an
application is implemented as part of some worker and JeeH always tracks which
worker is currently running. Workers are created as global objects, with a small
amount of code in main()
to start things up.
Workers lead to an event-driven style of programming: all workers have a
process
method which is called whenever a message for that worker needs to be
dealt with.
Events #
The messages exchanged between workers and IRQ handlers are much simpler than
the current Message
in JeeH, and will be called events from now on, also
to stress the event-driven approach.
An event is a tiny object which fits in a single 32-bit word. It only contains
an 8-bit destination worker ID (eDst
), an 8-bit tag (eTag
), and a 16-bit
payload (eVal
). Events can be efficiently passed around by value.
Priorities #
Every worker has a unique ID, which is also its priority. You can only send requests from a lower-priority worker to a higher priority worker, which is why all hardware related workers (which used to be called “device drivers” in JeeH) need to be given the highest IDs.
These requests consist of three parts:
- An event which identifies the worker, with a tag to specify what is requested.
- An optional event for the worker to send back as completion indicator.
- An optional
void*
pointer supplying additional request details.
When a worker receives a request, its process
method is called with the above
three arguments:
virtual Event process (Event in, Event out ={}, void* arg =nullptr);
If the work completes right away, the out
argument can be returned as result,
causing it to be sent as reply event. But workers can also decide to save up
these events and use them as replies later. This is what makes requests
asynchronous, e.g. timers, external interrupts, communcation protocol reads and
writes.
This design removes the distinction between application logic and device drivers: everything is a worker now. All the work is coordinated via requests and replies, based on the new lightweight events.
Interrupts #
Interrupt handlers are much simpler now. They still run in that special “handler” mode, but they only perform the truly time-critical tasks (such as checking and clearing their interrupt status and flags).
Interrupt handlers are implemented as (non-virtual) methods of their
corresponding worker. They are tied into the CMSIS IRQ dispatch vector using
JeeH’s IRQ_HANDLER
macro - here’s how the Ticker
object ties into the
SysTick
interrupt:
struct Ticker : Worker {
enum TAG { TICK, ... };
void process (Event in, ...) {
switch (in.eTag) {
case TICK: ...
...
}
...
}
void irqSysTick () {
...
trigger(TICK);
}
};
Ticker ticker;
IRQ_HANDLER(SysTick, ticker.interrupt)
Not only does this approach add less overhead than JeeH’s previous device
driver + interrupt dispatch mechanism, it also unifies the way requests to the
ticker are made with how the SysTick IRQ ties into events: both end up getting
dispatched through process
.
Atomicity #
Apart from interrupts, everything should work as one might expect: requests from
worker A to higher-priority worker B will dispatch the work by calling B’s
process
method, and replies lead to a call to A’s process
method once B is
done (i.e. returns to its caller).
Events from interrupts are a different story, however: when trigger
is called,
a new event is saved in a global area, a PendSV is requested, and the interrupt
handler returns. If several different IRQs fire at nearly the same time, all
their events will be saved. Once there are no more pending interrupts, PendSV
takes over and decides what to do, depending on the current worker priority:
-
If any higher-priority worker has pending events, the current one is suspended (abruptly and pre-emptively) and the new worker is activated to process its pending event(s).
-
Otherwise, PendSV returns to let the current worker finish its job. Once done, and if there are no more pending events for this worker, JeeH looks for a lower-priority worker with pending events, and activates it. If this was a pre-empted worker, it will resume where it left of.
-
If there is no more work, the code drops back to
main()
, where an infinite loop decides which low-power mode to switch to and for how long.
The end result is that every worker can consider all its own methods to be guarded against its own interrupts: if the worker is active, all interrupt events for it will be saved up until it is done. If the triggered worker is not active and of a higher priority than the one currently running, it will suspend that one and immediately be allowed to process the interrupt event.
Status #
Everything described above is preliminary. There’s enough working code at this point to feel confident that it can be made to work, but obviously the proof of the pudding is in the eating … let’s see how it goes!