Apr 19

CI20: The DDR odyssey, part 4: memtester

I can’t think of a worse type of bug than one related to faulty RAM. Actually, that’s not true — probably concurrency bugs are worse. Oh well, so much for the strong opening. In any case, we’ve spent the last 3 posts initialising the RAM, so let’s now run memtester and make sure it works.

The CI20 bare-metal project: all posts

Memtester is a popular open-source program for testing RAM. At its core it is quite simple: it runs a suite of tests on the RAM by writing specially-crafted data to it, designed to expose any issues with the RAM, then reading it back and verifying that it survived the trip.

I ported Memtester to run on bare-metal CI20 by removing most of it: all the command-line parsing and POSIX-specific functionality, apart from random number generation. Instead the tester runs directly in cached kernel memory (0x80000000) and tests a fixed size (200MB). We’re only testing a fixed amount of memory because kseg0 isn’t very large (256MB, when you exclude memory-mapped devices). Anyway, it doesn’t really matter if we don’t touch every byte of memory, since the point isn’t to discover bad RAM but to discover whether the DDR controller and DDR PHY are configured properly — problems which should be obvious even after testing only a very small amount of memory.

Running the test 
Get this version by checking out the OS as normal:

$ git clone https://github.com/nfd/ci20-os
$ cd ci20-os
$ git checkout tags/memtester

Now check out two “ports”:

$ cd ports
$ git clone https://github.com/nfd/ci20-os-port-posix posix
$ cd ..

Now build the system:

$ make
$ cd ports/memtester-4.3.0
$ make
$ cd ../../

Now run everything. This happens in two stages. First we use stage1 to initialise the memory:

$ python3 usbloader.py build/stage1.elf

Look at the serial console output, and when stage1 indicates that the memory test is complete, load memtester.

$ python3 usbloader.py ports/memtester-4.3.0/build/memtester.elf

You should see memtester load and start testing RAM. The test suite repeats until memtester finds a problem.

How it works
The hard part of all of this was the build system, which needed quite a bit of expansion to support “ports”. Ports are libraries or binaries for third-party applications or their support. The idea is that you drop the repository into ports/ and use a custom Makefile to build them as part of the rest of the system. The memtester Makefile looks like this:

PORT_SRC=memtester.c tests.c
PORT_LIBS=build/libci20.a ports/posix/build/libposix.a

include ../baremetal.mk

In other words, Memtester is a program (an ELF file) defined by two C source files, depending on the Posix port and libci20. Pretty straight-forward so far, but the hard work is performed by ports/baremetal.mk, which will build any dependencies and then the port itself. The implementation of baremetal.mk is a bit gruesome, using quite a few Make “features”. I’m perversely proud of it, which are the two feelings I always get when I accomplish something nontrivial in Make.

Next steps
An interesting thing to do now is to edit stage1/ddr.py, changing follow_reference_code=True to follow_reference_code=False. This activates a whole lot of changes related to RAM initialisation, but doesn’t seem to affect system stability — memtester runs just fine, which indicates that DDR might be more robust to timing variations than it looks. An interesting next step might be to measure memory speed in addition to memory reliability, but let’s move on from RAM for a little while: next time we’ll get back to the OS development proper, and look at handing interrupts.