Four ways to SPI

June 12, 2024

Musings

One of the problems I want to address in JeeH, is how to best interface with peripherals: built-in as well as connected via a common bus, e.g. I2C or SPI. There are two sides to this: talking to built-in hardware via device registers, and talking through built-in hardware to a connected module / chip.

Hardware register access #

Talking to the built-in hardware is a matter of reading and writing the hardware registers at specific addresses. This is already solved in JeeH with the use of IoReg<...> definitions. To configure the “B” port of the GPIO hardware in an STM32 for example, JeeH defines a GPIOB object (a constant type really) which can be referenced as an array. Here’s how to set its pin 8 high:

enum { ODR=0x14 };
GPIOB[ODR] |= 1<<8;

Setting individual bits is so common, that JeeH also has a special notation for it:

GPIOB[ODR](8) = 1;

With IoReg, writing a full-blown interface for a UART, SPI, or I2C device is simply a matter of setting up the proper values in their associated registers.

GPIO and pins #

The “General Purpose I/O” hardware is the most elementary way of controlling the individual pins of a microcontroller. Making an LED blink by periodically toggling a pin is the embedded world’s equivalent of the “Hello world” code task given to novice software programmers: you alternately set and clear the pin to which the LED is attached, with some delays. JeeH has the Pin class for this:

Pin led ("B8");
led.mode("P"); // put the pin in push-pull output mode
while (true) {
    led = 1;
    sys::wait(100);
    led = 0;
    sys::wait(400);
}

As if by magic, what looks like a led “variable” in C++ now controls an external I/O pin.

The SPI bus #

The “Serial Peripheral Interface” (SPI) bus is a simple way of connecting external perihperals and exchanging data with them at fairly high speed (several Mb per second). It uses 3 I/O pins, plus one for each attached device.

Bit-banged SPI #

The most basic implementation does everything in software, and can use any four GPIO pins to communicate with an attached SPI-compatible device, such as a sensor, or a flash memory chip. The API for this looks as follows:

struct Spi {
    void enable ();
    void disable ();
    int transfer (int v);
    void transfer (uint8_t const* out, uint8_t* in, int len);
};

A complete transaction consist of the following steps: enable -> one or more transfer calls -> disable. The second “block transfer” variant is more efficient (out or in can be null if not used).

There is an implementation in JeeH which does exactly that: a type with init & deinit methods plus the above methods, using the GPIO and Pin types to toggle pins and read/write bytes serially:

struct SpiGpio {
    void init (char const* desc) {...}
    void deinit () {...}

    void enable () const {...}
    void disable () const {...}
    int transfer (int v) const {...}
    void transfer (uint8_t const* out, uint8_t* in, int len) const {...}
    ...
};

This “SPI bus interface” is very simple and works well. Its benefits are: 1) this works with any pins available on the chip, and 2) the implementation is very small. The main drawback is that it’s slow: the µC must loop through each of the bits to be transferred (in both directions). In actual use, such an SPI bus will transfer 50..500 Kbits per second, at best.

Hardware SPI #

Because SPI is so common and useful, all µCs have built-in hardware support for it. This takes the “loop-per-bit” aspect out of the equation. The API is almost the same as above:

struct SpiHw {
    SpiHw (Config const c) ...

    void init (char const* defs, int speed) {...}
    void deinit () {...}

    void enable () const { nsel = 0; }
    void disable () const { nsel = 1; }
    int transfer (int v) const {...}
    void transfer (uint8_t const* out, uint8_t* in, int len) const {...}
    ...
};

The four main functions are the same, but this version needs a bit more information about which hardware registers to use (if the µC supports multiple SPI buses) and what bit rate to use (derived from the system clock).

Apart from the extra configuration requirements, day-to-day use is identical. Such a hardware interface will often handle up to 5..50 Mbits/sec.

DMA-based SPI #

At high transfer rates, it becomes hard for the CPU to keep up: writing bytes to the “TX” register and reading bytes from the “RX” register takes time. At “just” 8 Mbits/sec, each byte must be written and read within 1 µs to keep the SPI bus running at full speed. Quite a challenge if the CPU runs at say 16 MHz.

The next step up is to use “Direct Memory Access” (DMA) hardware, which is also present in most modern µCs. With DMA, you give it the address and count of an entire transfer, and it does the rest. With SPI, two DMA “channels” are needed by out and in for the block transfer method.

JeeH’s implementation looks as follows:

struct SpiDma : SpiHw, Device {
    SpiDma (Config const& c) ...

    void init (char const* defs, int speed) {...}
    void deinit () {...}

    void enable () const {...}       // re-used from SpiHw
    void disable () const {...}      // re-used from SpiHw
    int transfer (int v) const {...} // re-used from SpiHw
    void transfer (uint8_t const* out, uint8_t* in, int len) const {...}
    ...
}

Internally, this implementation is considerably more complex than the first two. There are interrupts involved (to signal the end of a DMA transfer), which is why the whole implementation must be based on JeeH’s “device driver” model.

On the outside, in actual use, there are the same four method calls as before.

The key difference here is the the block transfer now uses DMA. The CPU will enter sleep mode and DMA keeps the transfer(s) going at top speed. This will easily handle 50 Mbit/sec if the µCs clock is fast enough: in most chips, hardware SPI buses can run at up to half the system clock speed.

Fast SPI buses are useful for things like flash memory chips and SPI-connected LCD displays.

Asynchronous DMA #

A drawback of the previous three implementations is that the CPU is not able to handle any other work while these transfers are in progress. In many cases that’s probably fine, but there may be other activities which need attention, such as incoming serial data or continuous ADC sampling.

Meet the fourth SPI implementation provided by JeeH:

struct SpiDev : SpiDma {
    ...
};

This implementation is based on the SpiDma class. It has exactly the same API, so it’s not repeated here. It differs in one crucial way from SpiDma: instead of putting the CPU to sleep, all block transfers use JeeH’s sys::call mechanism, which is multi-threading aware: the current thread will be suspended while the DMA transfer is taking place, allowing other threads to run.

On the surface, things remain the same: SpiDev can be used in the same way as the other three variants: a call to the block transfer method returns once the transfer is finished. The difference is that other threads continue to run, handling thatever tasks they are responsible for. Whenever no other threads have any work to do, the CPU enters a lower-power sleep state, just like SpiDma.

(not further explained here: SpiDma can also be used in 100% non-blocking mode via sys::send)

It’s all about choice #

With these four variants, the choice is completely up to the application. And the C++ template system is what makes this truly flexible: a higher-level protocol driver or peripheral-specific processing module can use any of these variants.

As example, there is an interface implementation in JeeH for the “RFM69” ISM-band low-power radio module. Its implementation is as follows (details omitted for brevity):

template< typename SPI >
struct RF69 {
    SPI& spi;

    RF69 (SPI& s) : spi (s) {}
    void init (...) {...}

    void send (uint8_t header, uint8_t const* ptr, int len) {...}
    int receive (uint8_t* ptr, int len) {...}
    void sleep () {...}
    ...
};

This RF69 code can be used with bit-banged SpiGpio for example, connected to any µC pins:

SpiGpio spi;
spi.init(...);
RF69 rf (spi);
...
rf.send(...);
rf.receive(...);
rf.sleep();

It can also easily be switched to use another variant:

SpiDma spi (...);
spi.init(...);
RF69 rf (spi);
...

The key observation here, is that the RF69 implementation remains the same in all use cases. If you have a completely different SPI implementation with the same enable/disable/transfer API, then the RF69 code can use it. Taken to extremes: that RFM69 radio module could be off-site, attached via an internet connection and an “SPI-over-TCP/IP” bridge … and it would work (slowly!).

One more thing … #

While the design presented so far is flexible, there is one aspect which can be an issue: if some (presumably complex) module interface is based on SPI and there are multiple instances in use, e.g. one module using DMA for performance reasons and a second one connected via SpiGpio (perhaps it needs to use specific I/O pins?), then the template system will compile two separate versions of the module interface code. With RF69, you’ll get two copies: one specialised to access module #1 via DMA and the other specialised to access module #2 via GPIO.

For a simple type such as RF69, this code overhead would be negligible. But this design is all about supporting a very large variety of module interfaces. Some may be quite complex, or perhaps the µC simply has very little flash memory.

In C++, one solution is to use “abstract virtual base classes”, which transform such polymorphism into a run-time mechanism, instead of the compile-time nature of C++’s templates. This leads to a clear trade-off between having optimised copies and run-time re-use of the same source code.

With SPI, such an approach can also be supported:

SpiWrap<SpiDma> spi1 (...);
spi1.init(...);
RF69 rf1 ((SpiBase&) spi1);

SpiWrap<SpiGpio> spi2;
spi2.init(...);
RF69 rf2 ((SpiBase&) spi2);

...

There are now two radios and two instances of RF69, called rf1 and rf2. But since they are both based on objects of the same type (SpiBase), the RF69 code is not duplicated into two different specialisations. The RF69 methods exist exaclt once in the application. When the RF69 methods are called, they will use C++’s standard “v-table dispatch” to determined at run-time which SPI bus interface to use.

A side-effect of this approach, is that you can pass around a pointer to either SPI bus, by using SpiBase* mySpi = &spi1; (or spi2). At “base class” level, the SPI buses are interchangeable.

This example is a bit contrived (who needs two RFM69 modules on one µC?), but it illustrates that there’s a lot of choice. In most scenarios, the simple notation is best, as it’s the most efficient (in both code size and speed), but when the flexibility is needed, the “wrapped” approach is available.

It has taken me quite a bit of time to arrive at this design. In JeeH v1, the static types of everything made it impossible to implement these wrappers. For the infinitely curious: a code example can be found in the dev branch of JeeH’s git repository, see examples/hytiny/poll.cpp.

To sum it all up: higher-level code does NOT need to limit the way it can be used in an application!