Apr 7
2015

CI20: The DDR odyssey, part 2: getting it working

This post is part of the CI20 bare-metal project (link leads to index of all posts), and is part two of three or perhaps four posts about initialising the DDR on the CI20 Creator. This is an interesting one though because at the end we end up with usable RAM.

DDR3 is these two chips (and two on the other side)

DDR RAM is designed to make the RAM chips as cheap as possible by offloading a lot of the task of driving them to a separate chip. This chip typically contains two IP blocks: a DDR controller (DDRC), which does higher-level control of the RAM, and a DDR PHY, which handles the physical layer. To get the DDR working, we have to tell the DDRC and PHY a large amount of information about the physical characteristics of the RAM.

How do we get this information? Various sources.
  • A lot of the required information has standardised names which can be read straight out of the RAM datasheet
  • Most of the DDRC registers are documented in the JZ4780 programmer’s manual.
  • For the stuff that isn’t documented, we can take the required values from sample code, such as Ingenic’s board support codeu-boot, or ci20-tools.
But in addition to just having something which works it would also be rather nice to know what is going on. That isn’t so easy:
  • The DDRC, while mostly well-documented, is still missing some information which is supplied in the source as magic numbers.
  • The PHY is not documented at all. I wonder if it’s actually licensed from someone else? In any case, we can get some information about what the registers are from their symbolic names, from what is put into them, and, as a last resort, from datasheets for similar PHY blocks (PDF).
  • The sample code, even the best version of it (ci20-tools), is not great. This isn’t the programmers’ fault but simply a consequence of a bad original version (the Ingenic board support package).
All the sample code for DDR3 initialisation is written in C, but I ended up writing a Python program which generates C code. Doing the hard work in Python made life much easier, because it’s much easier to separate concerns. For example, here is the code to initialise a register with the DDR timing value named tRTP:

Name: tRTP
Description: READ to PRECHARGE command period
Value (from the datasheet): 4 DDR clock cycles or 7.5 nanoseconds, whichever is greater.

C implementation:

#define DDR_tRTP DDR_MAX(4, 7500)
...
tmp = DIV_ROUND_UP(DDR_tRTP * 1000, ps);
if (tmp < 1) tmp = 1;
if (tmp > 6) tmp = 6;
ddrc_timing1 |= (tmp << DDRC_TIMING1_TRTP_BIT);
other register values
writel(ddrc_timing1, DDRC_TIMING(1));

Python implementation:

ram.tRTP = NS('max(4 * nCK, 7.5)')
hardware.write_register(‘DDR.DTIMING1’, tRTP=ram.tRTP.ticks, other register values)

It’s hopefully pretty clear that the Python code is easier to understand. The key helpful part here is that tRTP becomes an object with two attributes “ns” and “ticks” — the first being the timing value in nanoseconds, and the second being the timing value in multiples of the DDR clock cycle. This reflects the fact that timing values are specified in nanoseconds (and any calculations on timing values are usually done in nanoseconds), but they are ultimately written into DDRC and PHY registers as multiples of a DDR clock cycle (one clock tick is 2.5 nanoseconds, at 400MHz).

You can view the Python online here: ddr.py. The interesting stuff is closer to the bottom of the file.

Class AutogenOutput produces C output based on method calls, so it defines what sort of operations can be performed to initialise RAM. For example, calling write_register causes AutogenOutput to produce a line of C code which modifies a register. Other operations include waiting for some time interval, updating only parts of a register, and repeatedly reading from a register until its value equals some predefined setting. These are all the operations which are required to initialise the DDR.

The actual initialisation is done in the init_ram function (which calls init_phy). It is full of function calls which look like this:

hardware.note('reset DDRC')
hardware.write_register('DDR.DCTRL', DFI_RST=1, DLL_RST=1, CTL_RST=1, CFG_RST=1)
hardware.write_register_raw('DDR.DCTRL', 0)

… where hardware is an instance of the AutogenOutput class. 

Further down is the generate function, which establishes the RAM timing parameters. The timing parameters are evaluated on-demand, which means they don’t need to be in any particular order — and they can be arbitrarily complex expressions. For example, the timing value tWR is a relatively simple 15 nanoseconds:

ram.tWR = NS(15)

… but the timing value tWTR is quite complex:

ram.tRTW = TCK('ram.tRL.ticks + ram.tCCD.ticks + 2 - ram.tWL.ticks')

Further down the generate function is a set of conditional settings depending on whether the initialisation should follow the reference code exactly or not. When writing the generator, I noticed some discrepancies between the reference code and the DDR datasheet, as well as what I’m at least 90% sure is a genuine bug. For example, the reference code stores what is apparently a nanosecond value into a register directly:

ram.phy_dtpr2_tCKE = TCK('math.ceil(ram.tCKE.ns)')

(note the forced conversion between nanoseconds and ticks), whereas the correct value should be in terms of ticks:

ram.phy_dtpr2_tCKE = ram.tCKE

In any case, that’s enough picking through code. You can check out the DDR-initialising bootloader using the “ddr” tag from the usual place:

$ git clone https://github.com/nfd/ci20-os
$ cd ci20-os
$ git checkout tags/ddr

If you make and install this, you will see a memory test passing.

Next up: a very interesting part of memory initialisation which this version completely avoids: DDR address remapping! After that, we’ll look at actually loading something with our boot loader.

JZ4780 USB Loading should start at 0xf4000800

A minor change: you’ll notice that the linker script and usbloader.py have changed to use the start address of 0xf4000800 — 2048 bytes higher than previously. This was a pretty annoying bug to track down: the first 2k of my bootloader binary was running fine, but any code beyond that point just wasn’t working at all. It turns out that if you load into the first 2k of TCSM, you can’t write more than 2k. I don’t know why this is (though I’m sure it’s to do with the fact that TCSM is divided into 2k “banks”), and the documentation doesn’t help (in fact, it flat-out states that TCSM for bootloading starts at 0xf4000000), but skipping the first 2k solves the problem.