Sun
Feb 4

Genetic Algorithms vs Super Mario World

I've been experimenting with using genetic algorithms to learn to play video games. It's working well -- above is Mario solving the first level of Super Mario World. I learnt a couple of interesting things about what works and what doesn't while playing around.

Architecture

I used a fixed-size neural network consisting of 32 x 28 = 896 input nodes, a fully-connected hidden layer of 64 nodes, and 8 output nodes for the 8 things I wanted Mario to be able to do (move left, right, up, or down; press A, B, X, or Y).

Networks were evaluated using a custom-written scripting interface for Snes9x which communicated with Python over a multi-process pipe. Genomes were assigned a fitness according to the final position of Mario. Mario's leftmost position is position 8, and he finishes the level at about position 4800, so this is about the range of fitnesses possible.

Random generation is surprisingly good

There was an interesting result published recently which showed that you could do pretty well on some Atari video games just by generating a large number of networks randomly (and picking the best one). In fact, random generation outperforms deep reinforcement learning algorithms on some games "suggesting that these domains may be easier than previously thought", say the authors politely.

This was definitely the case in my experience -- first-generation randomly-generated networks applied to Super Mario World get up to 3/4 of the way through the level without any learning algorithm applied whatsoever.

NEAT isn't great at Super Mario World

There's a famous video demonstrating an evolutionary algorithm called NEAT applied to Super Mario World. In my experiments with NEAT, I found that it didn't seem to be very well suited to this problem at all:

  • NEAT forces a slow start: NEAT grows the neural network over multiple generations by randomly adding neurons and connections. However, it is much faster to simply start with a large randomly-generated neural network and take advantage of the effects of random generation (see previous section). With a single connection, the best thing that a first-generation NEAT network can do is move Mario to the right and walk him into the first enemy, to his doom; this gives it a fitness score of about 400. On the other hand the best randomly-generated large neural network jumps over multiple enemies, stomps a couple of them, and clears various other hazards for an initial score of about 2500.
  • NEAT speciation unnecessarily splits the gene pool: The NEAT algorithm attempts to preserve genetic features using speciation. This is simply unnecessary for this problem and simply results in less genetic variation when breeding genomes.
  • NEAT prefers genes from the fitter of the two genomes when breeding, which doesn't really suit the problem: I'm not quite sure about this -- perhaps it is fine -- but Super Mario World levels are probably better described as a series of challenges, rather than a single long challenge. This is because the challenges presented (jumps, enemies) tend to repeat throughout the level. It is not obvious that a genome which stomps an enemy at the end of the level is better than a genome which stomps the same kind of enemy early on. A simpler approach (of always randomly selecting the parental gene to use) doesn't address the issue, but ameliorates it a little by ensuring that all genomes which are judged fit enough to breed (in my case, the top-performing 20% of the gene pool) will contribute the same number of genes.

I found that using NEAT on this problem took significantly longer than using a fixed network, but more investigation is required to say anything really conclusive.