Feb 16

World's cutest TTY

Using pseudoterminal support and very rudimentary VT-100 emulation to make the 128x32 PiOLED display into a TTY.

Feb 8

(A little more) Super Mario World

The above video shows a neural network solving a very simple Super Mario World level. So far, so already done, but this neural network had absolutely no training whatsoever: it is from the first generation of a population which was going to be evolved by a genetic algorithm. But since it's the first generation, no evolution actually took place -- the successful run of the level was generated entirely randomly. The entire process, from generating the initial population of random genomes, through evaluating them (a population of 200), to writing the successful genome to disk, took 23 seconds.*

I'm a curious mixture of pleased, impressed, and embarrassed.

* Concurrency was 5 though. With just one emulator running it would have taken about two minutes.

Feb 4

Genetic Algorithms vs Super Mario World

I've been experimenting with using genetic algorithms to learn to play video games. It's working well -- above is Mario solving the first level of Super Mario World. I learnt a couple of interesting things about what works and what doesn't while playing around.


I used a fixed-size neural network consisting of 32 x 28 = 896 input nodes, a fully-connected hidden layer of 64 nodes, and 8 output nodes for the 8 things I wanted Mario to be able to do (move left, right, up, or down; press A, B, X, or Y).

Networks were evaluated using a custom-written scripting interface for Snes9x which communicated with Python over a multi-process pipe. Genomes were assigned a fitness according to the final position of Mario. Mario's leftmost position is position 8, and he finishes the level at about position 4800, so this is about the range of fitnesses possible.

Random generation is surprisingly good

There was an interesting result published recently which showed that you could do pretty well on some Atari video games just by generating a large number of networks randomly (and picking the best one). In fact, random generation outperforms deep reinforcement learning algorithms on some games "suggesting that these domains may be easier than previously thought", say the authors politely.

This was definitely the case in my experience -- first-generation randomly-generated networks applied to Super Mario World get up to 3/4 of the way through the level without any learning algorithm applied whatsoever.

NEAT isn't great at Super Mario World

There's a famous video demonstrating an evolutionary algorithm called NEAT applied to Super Mario World. In my experiments with NEAT, I found that it didn't seem to be very well suited to this problem at all:

  • NEAT forces a slow start: NEAT grows the neural network over multiple generations by randomly adding neurons and connections. However, it is much faster to simply start with a large randomly-generated neural network and take advantage of the effects of random generation (see previous section). With a single connection, the best thing that a first-generation NEAT network can do is move Mario to the right and walk him into the first enemy, to his doom; this gives it a fitness score of about 400. On the other hand the best randomly-generated large neural network jumps over multiple enemies, stomps a couple of them, and clears various other hazards for an initial score of about 2500.
  • NEAT speciation unnecessarily splits the gene pool: The NEAT algorithm attempts to preserve genetic features using speciation. This is simply unnecessary for this problem and simply results in less genetic variation when breeding genomes.
  • NEAT prefers genes from the fitter of the two genomes when breeding, which doesn't really suit the problem: I'm not quite sure about this -- perhaps it is fine -- but Super Mario World levels are probably better described as a series of challenges, rather than a single long challenge. This is because the challenges presented (jumps, enemies) tend to repeat throughout the level. It is not obvious that a genome which stomps an enemy at the end of the level is better than a genome which stomps the same kind of enemy early on. A simpler approach (of always randomly selecting the parental gene to use) doesn't address the issue, but ameliorates it a little by ensuring that all genomes which are judged fit enough to breed (in my case, the top-performing 20% of the gene pool) will contribute the same number of genes.

I found that using NEAT on this problem took significantly longer than using a fixed network, but more investigation is required to say anything really conclusive.

Dec 26



Just in time for Christmas, almost, here is the classic Amiga demo State of the Art, runnable in a browser. Works well on Chrome, Safari, and Firefox on a computer. Less well on mobile (missing or broken audio — work in progress) and Edge (nobody cares about Edge).

Nov 23

Wire colour to TRS connector, Apple Lightning to 3.5" adapter

The Lightning to 3.5” adapter is a small digital to analogue converterconnected to a "standard headphone" (not for long) tip-ring-sleeve socket. If you cut the socket off (carefully, as the wires are thin and coated in plastic), you can repurpose it.

Here is the map between wire colour and destination on a 3.5” tip-ring-sleeve connector:

Wire colour TRS destination Purpose
Red Tip Left
Green+Gold Tip Mic
Blue Sleeve Ground
Green Ring Right
Red+Green Sleeve Mic ground

If you are going to attach solder to the wires, it's helpful to melt the plastic off them first (with a lighter, for example). 

Oct 29

Supporting Clang in the CI20 bare-metal project

This post is part of the CI20 bare-metal project.

Today I added support for the LLVM toolchain (The C/C++ frontend, Clang, and the linker, lld), in the CI20 bare-metal project. It is now possible to build the CI20 bare-metal project using only LLVM tools.

That doesn’t mean GCC support is gone, though! You can now choose between the two toolchains by setting the TOOLCHAIN CMake variable to either ‘gcc’ or ‘clang' (for example, by running ccmake, or by passing it to the cmake command line). If you’re using GCC, you can set TOOLCHAIN_PREFIX; if you’re using Clang, you can set CLANG_PATH (to the bin directory containing clang and friends). This reflects the fact that gcc tends to be prefixed for different targets, whereas Clang just uses the same binary for all targets.

Adding the support wasn’t as hard as I had thought — massive respect to the llvm team for building something so compatible. Here are some notes. I’m going to try to make them interesting, but I’ve got to be honest: it’s about adding toolchain support to an operating system; there are limits as to how interesting things could possibly get. 

  • It is significantly easier to build the LLVM toolchain than it is to build GCC — LLVM uses cmake, includes support for all architectures by default, and automatically includes tool support (clang and lld) if it’s in the right place. This is much nicer than the Makefiles, option-tastic configure scripts, and (if you wish to stay sane) third-party cross-compiler build tools of GCC.
  • It is much harder to get information out of llvm tools. Want to know what target triples clang supports? Check the source! On the other hand, the defaults actually do what I want — e.g. the right flavour of MIPS is in there by default (when I figured out how to ask for it) and the linker doesn’t need any particular configuration to support everything. 
  • CMake’s support for configuration is somewhat  rudimentary but works. Internally, user configuration is done by changing variables in the CMake cache — when CMakeLists.txt is evaluated, variables are substituted from the cache if they are present. This weird nomenclature threw me for a bit but it makes sense once you know the internal details.
  • LLVM’s linker, lld, is a bit less forgiving than GNU ld. I ran into a somewhat opaque error in which lld was complaining that the section flags for the .text section were 0x4 but I wanted to add a file with flags of 0x6; turns out that this is because I had set my start.S file as executable (0x4, ‘x') but not as allocating space (0x2, ‘a’) and lld was using that object (the first one it saw) to determine section flags. GNU ld, on the other hand, did the sensible thing and added the ‘a’ flag by itself.

Building an LLVM toolchain for the CI20 bare-metal project

If you want to use clang/lld instead of gcc/ld, and you don’t have a suitable clang installed, you’ll need to compile it yourself. I followed the Clang getting started instructions (also see the LLVM getting started instructions) and made these changes:

  • Downloaded lld: cd llvm/tools ; svn co lld
  • Set CMAKE_BUILD_TYPE to Release as I’m not interested in actually hacking on llvm
  • Set CMAKE_INSTALL_PREFIX so as not to interfere with my system toolchain

After that it was a simple make -j4 and make install.

Building a GCC toolchain for the CI20 bare-metal project, 2017 edition (on a Mac)

This isn’t very fair as a comparison, because I’m using a case-insensitive file system. But if you are, too, and want to use GCC, here’s how:

  • Install Crosstool-ng. Previously I’ve used Homebrew for this, but the Homebrew crosstool recipe is currently broken, so I cloned it from Github and installed it manually.
  • Create a case-sensitive disk image (for building) and mount it somewhere.
  • Create a case-sensitive disk image (for installing) and mount it somewhere. (Tip: hdiutil attach <dmg> -mountpoint <mountpoint> if you don’t want to run your toolchain from /Volumes)
  • Create a build directory inside your build image and cd into it.
  • ct-ng mips-unknown-elf
  • make menuconfig
  • Set prefix directory (Paths and Options) to your case-sensitive disk image.
  • Set endianness (Target Options) to “Little endian”.
  • Save and exit.
  • ct-ng build
  • Unmount (and later delete) the build disk image when done.

As a Mac-specific bonus, gcc, ld, ar, and ranlib (at least) appear to work fine even if they’re running from a case-insensitive filesystem. So if you like, you can unmount your installation image, mount it in /Volumes, and copy the installed files to the right spot (i.e. to the same directory structure as the previously-mounted image). You then don’t need to have any disk images mounted to run gcc.

Sep 18

zlib Z_DATA_ERROR "invalid distance too far back"

If anyone else is looking for this: Internet wisdom says it’s corrupt input data, but one other cause could be that you misread the documentation and passed 32 (enable gzip and zlib decoding with header detection) to inflateInit2, but forgot to add a window size. Pass 32 + 15. Magic constants are bad.

Jul 31

Evolution of SNES games (I swear real work is coming out of this)

Super Nintendo Entertainment System games made their backgrounds out of 8x8 pixel (usually) tiles. Have a look at the first level of Super Mario World, superimposed its tilemap:

Smw tilemap

It’s not super easy to see, but it’s clearly very regular. Walkable parts use tiles 385, 386, and 387. Blocks use tiles 96, 97, 98, and 99. The background is all tile 248 (there is actually a separate tile layer with the hills and things).

By contrast, look at level 1 of Donkey Kong Country:

Dkc tilemap

The walkable portion is all messed up. The walkable ground is made up of tiles, what, 345, 168, 166, 221, 19, 282, 169… and that’s just the ones I looked at quickly in the lower left.

Super Mario World came out in 1990, with the release of the SNES. Donkey Kong Country came out four years later. It seems pretty clear that the tilemap for SMW was made by hand. It seems almost equally clear that the tilemap for DKC was made by a computer. Not obvious here, but quite interesting, is that several tiles are repurposed. For example, there is a tile which is just “noise” — light brown dots on a darker brown background, used as part of the ground. However, that exact same tile, this time using a green palette, also forms part of the forest backdrop. 

This has implications for automated game-playing: it’s pretty obvious that you could train a computer to play SMW just by looking at the tilemap, but for DKC there is presumably a separate memory area which stores the walkable path — looking at the tile map doesn’t buy you much over looking at the bitmap itself.

Feb 5

Raytracing in the browser

Large and pretty smaller

On a whim, I bought “Ray Tracing in One Weekend” and wrote a ray tracer. I then put it on the web using Emscripten. Check it out here:

Interestingly / scarily, the Web version runs almost as fast as the native version on my Mac. That’s really impressive.

Jan 28

CI20: User-space cache coherency

This is a CI20 bare-metal project post.

Back in the dual-core post we talked a bit about multicore cache coherency — specifically, that kseg0 didn’t have it. This meant that if two cores wanted to communicate using shared memory in kseg0 they would have to use explicit cache flushing, or turn caching off entirely.

Now that we have a user mode, we can run basically the same test in user space, where things are very different. To do this, we modify sigma0’s main.c to do one of two things:

  • If it’s core 0, increment a counter
  • If it’s core 1, periodically print the counter.

The code for this is in the userspace-cache-coherency branch. If you build and run that, you should see a changing number when the kseg0 version repeatedly printed zeroes. The user-space version is somehow magically keeping the caches coherent between the two cores.

The magic is that the hardware implements the MSI protocol: when one core modifies a cache, the other cores find out somehow (either by bus snooping or by consulting a central directory) so they can invalidate their own state. This is very convenient for us, so why isn’t it implemented in kseg0 as well? Actually, on some architectures, it can be (by setting the CCA bits in the Config register of CP0). But this is implementation-specific: the standard “cacheable” setting (3) of CCA is incoherent.

As noted in the “When You Say Nothing At All” post, the philosophy we’ll adopt with this operating system is to leave inter-core communication to user space, as much as possible, and cache coherence is one of the reasons why.