Sun
May 3

CI20: Interrupt handling

This post is part of the CI20 bare-metal project — a project to write operating-system-level code on the CI20 MIPS-based demo board.

In the previous instalment we ran a memory tester and verified that DDR was initialised. Let’s start adding real OS features, starting with interrupts. We’ll add a generic interrupt mechanism and then apply it to the timer interrupt.

I don’t have any relevant pictures to go with this post, so here are some nesting swans I saw recently in Exeter:


Silly swans, building their nest so close to the path that the council had to give them a bit of privacy. They didn’t seem to care, though. Well, on to interrupt support on the CI20.

Generic interrupt support

The CI20 has a multi-level approach for handling interrupts.
  1. Firstly, the global interrupt enable flag in the CP0 STATUS register must be switched on.
  2. Then, each individual interrupt must be unmasked in the same register.
  3. Then, the interrupt controller hardware must unmask the interrupt for a particular device.
  4. Finally, the device itself must be configured to generate interrupts.
After you do all of this, the CPU will jump to a special location when an interrupt occurs, after storing the program counter and setting some flags (such as CAUSE). All the rest is up to software. 

The special location is well-defined, but is well-defined to be in any of four places, depending on CPU flags:
  • If BEV is set in the CP0 STATUS register, the address is in uncached memory.
  • If IV is set in the CP0 CAUSE register, the address for interrupts is distinct from the address for other types of exceptions, otherwise it’s the same.
We don’t want to use uncached memory, and, in fact, on the CI20, we can’t, as the address is 0xBFC00380, in memory-mapped device territory, so we’ll leave BEV unset. However, we *do* want separate addresses for interrupts and other exceptions, because this means that there’s a little less work to do when an interrupt or exception arrives, so we’ll set IV. 

Finally, we need to write an interrupt handler. The interrupt handler will consist of two parts: an assembly-language part which does the minimum necessary to safely jump into C code, and a C part which determines which interrupt has occurred and deals with it appropriately.

Here’s the assembly-language portion of the interrupt handler included in this post’s start.S:

.org 0x200
_irq_asm:
    sw at, -4(sp)
    sw v0, -8(sp)
    sw v1, -12(sp)

    [ snip: many more registers saved ]

    sw fp, -108(sp)
    sw ra, -112(sp)

    addi sp, sp, -112

    jal libci20_interrupt
    nop

    addi sp, sp, 112
    lw at, -4(sp)
    lw v0, -8(sp)
    lw v1, -12(sp)

    [ snip: many registers re-loaded ]

    lw fp, -108(sp)
    lw ra, -112(sp)

    eret

Fairly straightforward, then: save all registers, run the C portion of the interrupt handler, restore all registers, and return from interrupt. This is fine for now, but if this were to be used in a real system it would certainly want to switch to a dedicated interrupt stack — or, at the very least, make sure it was on a kernel stack.

The C portion is similarly straightforward. The CI20 has two interrupt pending registers, which are bitfields, one bit per device. A bit is set if an interrupt is pending for that device. The C routine allows device drivers to register a handler routine for their interrupt — if a handler is registered when an interrupt for that device arrives, it will be called.

Finally, the job of the handler is to inform the device that the interrupt has been handled. 

The timer interrupt

The “OS timer” device, used by timer.c, can be set up to generate an interrupt whenever the timer reaches a 32-bit comparison value. We previously initialised the timer to tick 3 million times a second, so let’s get it to generate an interrupt every millisecond by setting our comparison value equal to 3 million / 1000 = 3000. The timer then registers its interrupt handler for TCU0, which is the timer unit:

intc_register_handler_tcu0(ostimer_interrupt);

When a timer interrupt occurs, ostimer_interrupt is called. The only thing it absolutely has to do is to tell the TCU that the interrupt has been handled:

poke32(TFCR, TFR_OSTFLAG);

… but if that’s all it did then we wouldn’t even know it was working. So in addition to silencing the interrupt, we add support for timer callbacks, functions which are invoked by the timer interrupt handler:

void ostimer_interrupt(void)
{
for(int i = 0; i < timer_callback_count; i++)
timer_callbacks[i]();

/* Clear interrupt flag. If we don't do this we will immediately return to
* this interrupt on exit! */
poke32(TFCR, TFR_OSTFLAG);
}

At this point, finally, we can register a callback handler in our main() function and increment a 1ms counter.

Running the code

Check out the code as usual, this time using the interrupts tag:

$ git clone https://github.com/nfd/ci20-os
$ cd ci20-os
$ git checkout tags/interrupts

Make sure you have also checked out pyelftools if you haven’t already:

$ cd thirdparty/

Now build and run. I now use a single command for this:

$ make && python3 usbloader.py build/stage1.elf && sleep 1 && python3 usbloader.py build/kernel.elf

Benchmarking 

If all goes well, you should see a short benchmark run three times, printing something like this:

00000C3C
00000C3C
00000C3C

This is the number of milliseconds taken to run a simple delay loop in main(). It doesn’t mean very much by itself, but I was curious to see how what we’ve got so far compared with Linux. So I wrote a short Linux benchmark which did the same thing (download it here), booted my CI20 into Linux, ran the benchmark, and got these results:

00000C51
00000C48
00000C49

In other words, Linux has more variance and is slightly slower than our OS. This is exactly as we’d expect: Linux is running other things behind the scenes, which will both cause the increased variance and slow the benchmark down. The results are within 5% of each other, however, which is encouraging — we did all the right things so far, or, at least, we did them as right as Linux does.

Other changes

This release includes quite a few changes:
  • “os” became “kernel” — which makes much more sense.
  • The kernel-mode stuff is mostly contained in a library, libci20, which is used by both stage1 and kernel. But use of the library started to diverge in this section, and will certainly diverge further. For example, both stage1 and kernel require a timer, but they use it differently: stage1 uses it for busy-waiting, while kernel uses it to generate periodic interrupts. Also, stage1 needs to be less than 14K, so there’s no room for fancy extra features. There is no perfect solution to this when you’re working in C. My solution is to link two different libraries, libci20 and libci20_mini. The mini version has won’t add any more files, but has its own simple implementations of some things (like the timer). The Makefile changed to reflect this.
  • The kernel’s assembly-language startup file, start.S, now zeroes out BSS. It didn’t do it before because previously we didn’t have a BSS section. (BSS (https://en.wikipedia.org/wiki/.bss) is where all uninitialised file-scope variables get placed — like the array defined in libci20/interrupts.c.) The kernel’s linker.lds file changed to accommodate the new sections, and also to align the data blocks to the length of a cache line. Note that the BSS section (and its architecture-specific friend, .sbss) is marked as “NOLOAD” — which means it takes up no space in the file at all.
  • The USB loader changed again, this time to pad uploaded data to a multiple of 2k when writing to TCSM. Experiments with crossing 2k block boundaries failed unless the data were padded. I have no idea why this peculiarly hardware-specific quirk works, or even if it’s doing the right thing, but it does seem to work.
The end, or just the beginning?*

We’ve now got all the resources we need to start writing an operating system. Next time we’ll begin on that, starting with the scheduler.

* Probably not the beginning.