2016
CI20: Dual cores! Double your fun!
We’re back with some actual code! This time, let’s bring up the second core.
The CI20’s JZ4780 SOC has two MIPS cores on it. By default, though, it starts up with only the first core running, which makes sense — a single core presents the simplest startup environment, and it’s easy enough for the first core to bring up the others.
There are apparently several competing standards for multicore MIPS systems. Refreshingly, the JZ4780 doesn’t follow any of them (well, let’s say it follows the “XBurst standard”, as implemented by… the JZ4780). However, once you know what’s involved, it’s very simple to start up a new core. The key code is in soc/jz4780/multicore.c. There are four main steps:
- Ensure the core has a clock source.
- Set an entry point for the new core.
- Ensure the core is powered up.
- Take the core out of reset.
CP0 extensions for multicore
- CORE_CTL: determines whether the core is in sleep mode and whether it is in reset. On startup, CORE1 is held in reset — taking it out of reset (assuming it has power and a clock source) will immediately start it running at its entrypoint. CORE_CTL also has a flag which determines whether a core will use the default (ROM) entry point, or a custom one.
- CORE_REIM: allows us to set our custom entry point.
Low-power and clock gating settings
As a designed-for-mobile SOC, the JZ4780 has a lot of support for controlling energy use. You can selectively power-down almost everything in the SOC (the key register here is LCR, for Low-Power Control Register), and you can also selectively disable the clock source for most system modules using “clock gating”. There is a little more info on this lower down, if you’re interested, but the basics are that you need to enable the core’s clock and make sure it’s not in one of several low-power modes before you can start it up.
Running the code
Check out and run this code in the usual way (the release tag for this version is “multicore”, i.e. git checkout tags/multicore).
You should see “Hello, world!” followed by (or perhaps partially interspersed with) “Hello, multicore world!”, followed by three rows of 00000000. What’s with those?
In a previous release, we enabled interrupt handling and set up a timer to increment a counter. The kernel entrypoint would then print the value of this counter at regular intervals. But in this release, the code in main.c doesn’t register a counter (and actually the init code doesn’t even set up the timer). However, if you look further down main.c, you will see something interesting. The second core is incrementing the counter like mad, but the first core keeps printing 00000000, even though the “counter” variable is volatile.
The reason is that both cores are accessing “counter” from their cache. Each core has its own level-1 cache (“primary cache” in MIPS terminology), which means that if caching is enabled we need to do something special to ensure that the caches remain coherent, i.e. that they agree with each other. However, if you look in start.S, you will see that we are explicitly not doing this:
/* Enable caching in kseg0 */
li t0, CACHE_MODE_CACHABLE_NONCOHERENT
mtc0 t0, CP0_CONFIG
So we can just change that CACHE_MODE_CACHABLE_NONCOHERENT to CACHE_MODE_UNCACHED and it will work, right? Right, but if you try this you will notice that everything runs much slower (and you may want to change the delay loop in main() from 0xfffffff to, say, 0xfffff, so you’re not waiting for ages).
Fortunately, turning off caching entirely is not the only solution to this problem. More on that in the next installment.
Implementation notes
Stacks: The kernel now declares its own stack (as part of its BSS). Previously, start.S was simply carving off a piece of RAM to use as a stack. This obviously doesn’t make sense for anything other than a tutorial, but it gets worse with multiple cores. architecture/mips32/kernelstack.c declares a (small) kernel stack for each core, and architecture/mips32/start.S and soc/jz4780/multicore_lowlevel.S make use of it.
CP0 accessor functions: We now take a FreeBSD-inspired approach to, e.g., writing to device registers or to CP0 — small inline functions with sensible names, used from C. This is nicer than bit shifting and poking values, and much nicer than dropping down to assembly language. An example of this approach is in include/soc/jz4780/jz47xx-cgm.h, which defines a number of functions for gating and ungating clock sources, as used by soc/jz4780/multicore.c.
Library-ification: Start.S, being MIPS-specific, is now part of libsystem.a. Unfortunately this means that it needs to be in its own section, so that the linker can be instructed to place it before any other code. I’m still not entirely happy with what’s going on here behind the scenes — GNU ld seems to be creating a LOAD segment which has zero length and is entirely BSS, which doesn’t make much sense to me — but the outcome is positive overall.
Supplemental material: Coprocessor 0
I have hinted at CP0 a few times during this series, but I haven’t explicitly addressed it. The MIPS design includes support for “coprocessors”, which are accessed using special instructions. Coprocessor 0, which is required, is mostly about memory management and caching, but it also has a grab-bag of other things, such as processor information (make, model, size of cache line), plus implementation-defined portions which can do anything — such as turn on the second core, in the case of the JZ4780.
There are two relevant assembly-language instructions: mtc0 (move register value to coprocessor 0) and mfc0 (move from coprocessor 0 to register). In this release, any access to them from C is wrapped up in CP0 accessor functions (see above).
Supplemental material: everything you would have wanted to know about clock gating, had you known that clock gating existed
Q: What is clock gating?
A: Clock gating is an energy-saving measure. Switching transistors on and off repeatedly, which is what a clock does, takes energy. In a complex system like the JZ4780, a single clock is connected to multiple modules — so it’s very likely that the clock will be running, but not all modules will be active. In this case, you can gate (disable) the clock source for the modules you aren’t using, without having to turn the clock off completely.
Q: How does clock gating apply here?
A: Well, the jz4780 lets you gate pretty much everything, including the second processor core — and, on startup, pretty much everything, including the second core, is in fact gated. If we enable CORE1 with a gated clock it won't execute any instructions.
Q: I notice that the JZ4780 programmer’s manual refers to CORE1’s clock gate bit as “P1”, with no further information. Did figuring out that P1 meant CORE1 take any time?
A: No, none at all. Well, not much. I mean, a little. Well, definitely not much more than I’d want to admit.
Q: I also notice that the JZ4780 programmer’s manual refers to CORE1 as “SCPU”, with no further information, in documentation for the same module. I guess that was similarly easy to recognise?
A: *hopeless stare*
Q: Look, would you like a coffee?
A: I’d love a coffee!