Mar 31
2015

CI20: The DDR odyssey, part 1: PLLs

This post is part of the CI20 bare-metal project (link leads to index of all posts), and is part one of three or perhaps four posts about initialising the DDR on the CI20 Creator.

The CI20 has 1GB of DDR3 SDRAM onboard. Communicating with DDR is quite complex. Actually even the reason that it’s complex is somewhat complex: essentially a trade-off was made early on to keep the actual RAM as simple and cheap as it could possibly be without compromising performance. The result of that decision is that a lot of the control circuitry lives in a separate bit of hardware called a DDR controller. On the CI20, the DDR controller is part of JZ4780 SOC. Modern Intel chips are similar, having the DDR controller as part of the CPU package.

The DDR controller is complemented by another bit of hardware (“IP block” in hardware design speak) called the DDR PHY. These two tightly-integrated parts split the task of DDR control into a high level and a low level:

The DDR controller maps the multiple RAM chips to a logical, flat address space. It takes into account the timings of the RAM (how fast you can read and write it, basically), manages DRAM refresh, and uses its knowledge of the physical layout of the RAM to try to maximise performance at the protocol level. It communicates with the DDR through the PHY.
The DDR PHY manages the physical interface to the RAM. The RAM is in a separate chip from the PHY (which is part of the system-on-chip), connected by long (to a computer) circuit-board traces. The PHY is in charge of sending high-speed signals along these wires and dealing with the complexities involved. For example, without proper impedance matching, signals can reflect off the end of the wire and bounce back to interfere with incoming signals. The PHY also knows about details of the RAM timing, so it can optimise the speed at which it communicates with the DDR at a physical level. The PHY is kind of fascinating, not least because it’s almost certainly the least-well-documented part of the CI20 apart from the GPU.

So the DDR controller and PHY do all the really hard work, but in order to do it they have to know a lot of information about the RAM. Getting that information into these two parts is the subject of the next few posts. Before we get there, we have one more bit of groundwork to cover: supplying a clock signal to the DDR controller, the PHY, and the DDR itself.

Phase-locked loops

The CI20 has two external oscillators. One is the very slow 32KHz oscillator used for the real-time clock. The other is a much-faster 48MHz oscillator used by everything else. Well, “much faster” is relative — 48MHz is still a long way from the 1.2GHz we need to run the CPU at full speed, or even from the 400MHz required for the DDR. So how do we generate these much faster clocks?

The answer is a circuit called a phase-locked loop, or PLL. The electrical-engineering details of a PLL are out of scope for this post, and, frankly, out of scope for my brain, but conceptually they seem simple enough: they generate a frequency which is some multiple of their input frequency, and stay synchronised by using a phase detector. If the output is out of phase with the input, the PLL will either speed up or slow down its internal oscillator until the phases match. In other words, the speed of the oscillator is controlled by a feedback loop.

PLLs typically also incorporate a frequency divider (or two), so you can essentially multiply the input frequency by any fraction you like, within the range of numerator and denominator supported by your specific PLL.

The JZ4780 has four PLLs: APLL, MPLL, VPLL, and EPLL. These names seem arbitrary, but they are apparently traditional — plenty of non-jz47xx code refers to PLLs with these names. In fact, even the purposes of these PLL names are re-used, to some extent, between devices. For example, VPLL is used to drive the video hardware, and EPLL is often audio.

Initialising a JZ4780 PLL is simple enough, with the obvious-sounding caveat that we shouldn’t re-initialise a PLL that’s already being used to drive the CPU clock, and the less-obvious caveat that we can’t change a PLL’s speed to more than about 20% faster or slower without halting and restarting it, because it will “lose lock” (the feedback loop will go out of synchronisation). This is relevant to us because the USB bootloader built in to the JZ4780 initialises the first PLL, PLLA, and uses it to drive the CPU. Consequently the code for this post initialises the second PLL, PLLM, and uses that one for the CPU and DDR.

The code is under the tag “plls”:

$ git clone https://github.com/nfd/ci20-os
$ cd ci20-os
$ git checkout tags/plls

… and can be built the regular way:

$ make bootloader.bin
$ python3 usbloader.py bootloader.bin

(If you boot this, you should see the CI20 begin a memory test, but never finish it. This is to be expected — the DDR clock is initialised, but the DDR controller isn’t yet. So the memory test ends up attempting to write to an address which doesn’t exist.)

The PLL code is in the file pllclock.c and is fairly lavishly commented. Perhaps the most interesting parts are:

The magic numbers defined at the top (CI20_PDIV, CI20_H2DIV and so on) come from Ingenic’s reference code and determine how fast particular parts of the SOC should run. This aspect of the SOC (the timing information) isn’t part of the public documentation, so we don’t have much choice but to re-use these numbers as is: they set up various dividers to ensure that the peripherals run at about 100MHz, the AHB buses run at 200MHz and 400MHz, and the L2 cache runs at half the speed of the CPU. Presumably they could be changed, but I don’t know to what extent.
Switching the CPU (and friends) to a new PLL is a two-step process: first the frequency dividers are installed, and then the PLL source is switched over.
CI20 PLLs have one multiplier and two dividers. The first divider is applied to the input frequency, and the second divider is applied to the output frequency. I don’t know when you’d use one and when you’d use the other, but the reference code uses the input divider for the CPU’s PLL and the output divider for PLLs for video and audio.
The PLL is set up with a multiplier of twice the speed required, and then with a divider of two. Apparently this is a reasonably common thing to do, to reduce jitter. Or to normalise the duty cycle. It’s not clear which.

Magic ahead

Sadly, we are starting to enter a realm of magic numbers which describe undocumented aspects of the hardware, generally typically related to timing. This happens a little with the PLLs, just slightly more with the DDR controller (which has reasonable documentation), and significantly more with the DDR PHY (which has no official documentation at all). Nonetheless the situation is far from hopeless: it’s possible to get quite a clear idea of what’s going on even without official documentation, as we’ll see in the next few posts.

Previous posts

About

CI20: The DDR odyssey, part 1: PLLs