Four ways to SPI
June 12, 2024
One of the problems I want to address in JeeH, is how to best interface with peripherals: built-in as well as connected via a common bus, e.g. I2C or SPI. There are two sides to this: talking to built-in hardware via device registers, and talking through built-in hardware to a connected module / chip.
Hardware register access #
Talking to the built-in hardware is a matter of reading and writing the
hardware registers at specific addresses. This is already solved in JeeH with
the use of IoReg<...>
definitions. To configure the “B” port of the GPIO
hardware in an STM32 for example, JeeH defines a GPIOB
object (a constant type
really) which can be referenced as an array. Here’s how to set its pin 8 high:
enum { ODR=0x14 };
GPIOB[ODR] |= 1<<8;
Setting individual bits is so common, that JeeH also has a special notation for it:
GPIOB[ODR](8) = 1;
With IoReg
, writing a full-blown interface for a UART, SPI, or I2C device is
simply a matter of setting up the proper values in their associated registers.
GPIO and pins #
The “General Purpose I/O” hardware is the most elementary way of controlling the
individual pins of a microcontroller. Making an LED blink by periodically
toggling a pin is the embedded world’s equivalent of the “Hello world” code task
given to novice software programmers: you alternately set and clear the pin to
which the LED is attached, with some delays. JeeH has the Pin
class for this:
Pin led ("B8");
led.mode("P"); // put the pin in push-pull output mode
while (true) {
led = 1;
sys::wait(100);
led = 0;
sys::wait(400);
}
As if by magic, what looks like a led
“variable” in C++ now controls an
external I/O pin.
The SPI bus #
The “Serial Peripheral Interface” (SPI) bus is a simple way of connecting external perihperals and exchanging data with them at fairly high speed (several Mb per second). It uses 3 I/O pins, plus one for each attached device.
Bit-banged SPI #
The most basic implementation does everything in software, and can use any four GPIO pins to communicate with an attached SPI-compatible device, such as a sensor, or a flash memory chip. The API for this looks as follows:
struct Spi {
void enable ();
void disable ();
int transfer (int v);
void transfer (uint8_t const* out, uint8_t* in, int len);
};
A complete transaction consist of the following steps: enable
-> one or more
transfer
calls -> disable
. The second “block transfer” variant is more
efficient (out
or in
can be null if not used).
There is an implementation in JeeH which does exactly that: a type with
init
& deinit
methods plus the above methods, using the GPIO and Pin types to
toggle pins and read/write bytes serially:
struct SpiGpio {
void init (char const* desc) {...}
void deinit () {...}
void enable () const {...}
void disable () const {...}
int transfer (int v) const {...}
void transfer (uint8_t const* out, uint8_t* in, int len) const {...}
...
};
This “SPI bus interface” is very simple and works well. Its benefits are: 1) this works with any pins available on the chip, and 2) the implementation is very small. The main drawback is that it’s slow: the µC must loop through each of the bits to be transferred (in both directions). In actual use, such an SPI bus will transfer 50..500 Kbits per second, at best.
Hardware SPI #
Because SPI is so common and useful, all µCs have built-in hardware support for it. This takes the “loop-per-bit” aspect out of the equation. The API is almost the same as above:
struct SpiHw {
SpiHw (Config const c) ...
void init (char const* defs, int speed) {...}
void deinit () {...}
void enable () const { nsel = 0; }
void disable () const { nsel = 1; }
int transfer (int v) const {...}
void transfer (uint8_t const* out, uint8_t* in, int len) const {...}
...
};
The four main functions are the same, but this version needs a bit more information about which hardware registers to use (if the µC supports multiple SPI buses) and what bit rate to use (derived from the system clock).
Apart from the extra configuration requirements, day-to-day use is identical. Such a hardware interface will often handle up to 5..50 Mbits/sec.
DMA-based SPI #
At high transfer rates, it becomes hard for the CPU to keep up: writing bytes to the “TX” register and reading bytes from the “RX” register takes time. At “just” 8 Mbits/sec, each byte must be written and read within 1 µs to keep the SPI bus running at full speed. Quite a challenge if the CPU runs at say 16 MHz.
The next step up is to use “Direct Memory Access” (DMA) hardware, which is also
present in most modern µCs. With DMA, you give it the address and count of an
entire transfer, and it does the rest. With SPI, two DMA “channels”
are needed by out
and in
for the block transfer method.
JeeH’s implementation looks as follows:
struct SpiDma : SpiHw, Device {
SpiDma (Config const& c) ...
void init (char const* defs, int speed) {...}
void deinit () {...}
void enable () const {...} // re-used from SpiHw
void disable () const {...} // re-used from SpiHw
int transfer (int v) const {...} // re-used from SpiHw
void transfer (uint8_t const* out, uint8_t* in, int len) const {...}
...
}
Internally, this implementation is considerably more complex than the first two. There are interrupts involved (to signal the end of a DMA transfer), which is why the whole implementation must be based on JeeH’s “device driver” model.
On the outside, in actual use, there are the same four method calls as before.
The key difference here is the the block transfer now uses DMA. The CPU will enter sleep mode and DMA keeps the transfer(s) going at top speed. This will easily handle 50 Mbit/sec if the µCs clock is fast enough: in most chips, hardware SPI buses can run at up to half the system clock speed.
Fast SPI buses are useful for things like flash memory chips and SPI-connected LCD displays.
Asynchronous DMA #
A drawback of the previous three implementations is that the CPU is not able to handle any other work while these transfers are in progress. In many cases that’s probably fine, but there may be other activities which need attention, such as incoming serial data or continuous ADC sampling.
Meet the fourth SPI implementation provided by JeeH:
struct SpiDev : SpiDma {
...
};
This implementation is based on the SpiDma
class. It has exactly the same API, so it’s
not repeated here. It differs in one crucial way from SpiDma
: instead of
putting the CPU to sleep, all block transfers use JeeH’s sys::call
mechanism,
which is multi-threading aware: the current thread will be suspended while the
DMA transfer is taking place, allowing other threads to run.
On the surface, things remain the same: SpiDev
can be used in the same way
as the other three variants: a call to the block transfer method returns once
the transfer is finished. The difference is that other threads continue to run,
handling thatever tasks they are responsible for. Whenever no other threads
have any work to do, the CPU enters a lower-power sleep state, just like
SpiDma
.
(not further explained here: SpiDma
can also be used in 100% non-blocking mode via
sys::send
)
It’s all about choice #
With these four variants, the choice is completely up to the application. And the C++ template system is what makes this truly flexible: a higher-level protocol driver or peripheral-specific processing module can use any of these variants.
As example, there is an interface implementation in JeeH for the “RFM69” ISM-band low-power radio module. Its implementation is as follows (details omitted for brevity):
template< typename SPI >
struct RF69 {
SPI& spi;
RF69 (SPI& s) : spi (s) {}
void init (...) {...}
void send (uint8_t header, uint8_t const* ptr, int len) {...}
int receive (uint8_t* ptr, int len) {...}
void sleep () {...}
...
};
This RF69 code can be used with bit-banged SpiGpio for example, connected to any µC pins:
SpiGpio spi;
spi.init(...);
RF69 rf (spi);
...
rf.send(...);
rf.receive(...);
rf.sleep();
It can also easily be switched to use another variant:
SpiDma spi (...);
spi.init(...);
RF69 rf (spi);
...
The key observation here, is that the RF69
implementation remains the same
in all use cases. If you have a completely different SPI implementation with the
same enable
/disable
/transfer
API, then the RF69 code can use it. Taken to
extremes: that RFM69 radio module could be off-site, attached via an internet
connection and an “SPI-over-TCP/IP” bridge … and it would work (slowly!).
One more thing … #
While the design presented so far is flexible, there is one aspect which can be an issue: if some (presumably complex) module interface is based on SPI and there are multiple instances in use, e.g. one module using DMA for performance reasons and a second one connected via SpiGpio (perhaps it needs to use specific I/O pins?), then the template system will compile two separate versions of the module interface code. With RF69, you’ll get two copies: one specialised to access module #1 via DMA and the other specialised to access module #2 via GPIO.
For a simple type such as RF69, this code overhead would be negligible. But this design is all about supporting a very large variety of module interfaces. Some may be quite complex, or perhaps the µC simply has very little flash memory.
In C++, one solution is to use “abstract virtual base classes”, which transform such polymorphism into a run-time mechanism, instead of the compile-time nature of C++’s templates. This leads to a clear trade-off between having optimised copies and run-time re-use of the same source code.
With SPI, such an approach can also be supported:
SpiWrap<SpiDma> spi1 (...);
spi1.init(...);
RF69 rf1 ((SpiBase&) spi1);
SpiWrap<SpiGpio> spi2;
spi2.init(...);
RF69 rf2 ((SpiBase&) spi2);
...
There are now two radios and two instances of RF69, called rf1
and rf2
. But
since they are both based on objects of the same type (SpiBase
), the RF69
code is not duplicated into two different specialisations. The RF69 methods
exist exaclt once in the application. When the RF69 methods are called, they
will use C++’s standard “v-table dispatch” to determined at run-time which SPI
bus interface to use.
A side-effect of this approach, is that you can pass around a pointer to either SPI
bus, by using SpiBase* mySpi = &spi1;
(or spi2
). At “base class” level,
the SPI buses are interchangeable.
This example is a bit contrived (who needs two RFM69 modules on one µC?), but it illustrates that there’s a lot of choice. In most scenarios, the simple notation is best, as it’s the most efficient (in both code size and speed), but when the flexibility is needed, the “wrapped” approach is available.
It has taken me quite a bit of time to arrive at this design. In JeeH v1, the
static types of everything made it impossible to implement these wrappers. For
the infinitely curious: a code example can be found in the dev
branch of
JeeH’s git repository, see
examples/hytiny/poll.cpp
.
To sum it all up: higher-level code does NOT need to limit the way it can be used in an application!