Jul 26
2009

GOSUB with Python byte code

This blog post is about some of the terrible ways you can abuse Python byte code: specifically, by using it to implement lightweight function calls ("GOSUB") directly in byte code. Doing this is obviously stupid and disgusting, and (spoiler) regular Python function calls are fast enough to make this hack not worth the effort. But trying it was quite interesting.

We don't need a new namespace, or exception handling, or any of the other nice features that Python functions provide. In fact the only thing we need is support for 1) jumping to an arbitrary location and then 2) returning to wherever we came from later. That is, we need GOSUB.

Jumping to arbitrary locations is very easy, since Python bytecode has a JUMP_ABSOLUTE opcode. Unfortunately the argument to this opcode must be an offset calculated at compilation time -- there is no way to change the address of a JUMP_ABSOLUTE statement from within the bytecode itself. This makes returning to wherever we came from later impossible with JUMP_ABSOLUTE -- unless we only ever call the subroutine from one location, which isn't true in general.

Fortunately, more devious people than I think about these problems: Phillip J. Eby has discovered that a certain opcode, END_FINALLY, normally used at the end of Python finally blocks, does its work by taking the address to return to from the stack. If we put the location we want to jump to on the stack and call END_FINALLY, we can jump wherever we want. Effectively, it's a computed GOTO. Sadly, END_FINALLY was designed as part of the exception handling machinery and not as part of the dodgy-hack-supporting machinery. As a result, the code is a little more complicated than you might think -- but not much.

This hack can be nicely abused to implement GOSUB, like so:

          0 LOAD_CONST               4 (12)
          3 LOAD_CONST               1 (37)
          6 SETUP_LOOP               3 (to 12)
    >>    9 JUMP_ABSOLUTE           14
         12 POP_BLOCK           
         13 RETURN_VALUE        
    >>   14 LOAD_CONST               2 (2)
         17 INPLACE_ADD         
         18 ROT_TWO             
         19 LOAD_CONST               3 (32)
         22 END_FINALLY

Here we are calling a subroutine that adds 2 to the value on the top of the stack. First we push a value to return to after the subroutine finishes onto the stack -- in this case we return to byte 12. Then we push an argument to the subroutine, in this case 37. SETUP_LOOP creates a block for looping or exception handling, and is required to make the END_FINALLY hack work. We then jump directly to the subroutine at byte 14.

In the subroutine we add 2 to whatever is at the top of the stack. Our stack then contains the return address, followed by the result of our calculation. We're about to jump to the return address, so we want that at the top of the stack. ROT_TWO does that for us by swapping the top items at the top of the stack. We then LOAD_CONST a magic value that is part of the END_FINALLY hack -- it is WHY_CONTINUE, a number that has special meaning to the CPython interpreter. It pops WHY_CONTINUE off the stack, which instructs it to continue execution at the value at the top of the stack -- which is now our return value.

Execution then resumes at byte 12, which removes the block we created with SETUP_LOOP, and then returns the value at the top of the stack, which is the return code from our subroutine. In this case the function returns the value 39.

There are some problems with this approach:

  • POP_BLOCK, in addition to removing a block from the block stack, also returns the stack level to the level that it was when SETUP_LOOP was called. So if the subroutine returns any values on the stack you have to be careful. One simple way to deal with this is to have subroutines that take as many arguments as they return (i.e. 1).
  • BREAK_LOOP now has useless (but well-defined) behaviour: it acts as if the subroutine has immediately returned, but doesn't provide any way of letting the caller know what happened. BREAK_LOOP still works if the subroutine sets up an actual (real) loop by pushing another block to the block stack. It should actually be possible to "fix" this by having SETUP_LOOP target an exception-handling block, essentially giving the subroutine a choice of two different locations to return to -- taking the nastiness to the next level!
  • The CPython block stack is only 20 levels deep, which is pretty reasonable if you think about it as being equivalent to 20 levels of indentation. :) This means that as things currently stand, subroutine calls can't be nested very far. There is a very easy way to fix this one, though.
  • All this is an incredibly seedy hack-on-top-of-a-hack. It relies on a side-effect of END_FINALLY that is only guaranteed to work in the canonical C implementations of Python, and even then only versions 2.3 and upward.

Adding an extra opcode would make this much nicer. There was some talk back in 2006 about implementing a JUMP_TOP opcode, which would simply jump to the bytecode position at the top of the stack. Unfortunately, the motivation for adding it was to add a "switch" statement to Python, and nowadays that doesn't seem very likely.

Fortunately, there is no real motivation to use this hack. Running a simple test (10000000 iterations, on a 2.2Ghz MacBook) gives the following results:

MethodTime in seconds
GOSUB2.78
CALL_FUNCTION, hand-written3.00
CALL_FUNCTION, Python3.67

In these results, "GOSUB" is the hand-written GOSUB hack discussed above, and the other two use the regular Python function call mechanism: one with hand-written bytecode, and the other written in regular Python and compiled in the normal way. Even with ten million calls there isn't that huge a speed improvement to make GOSUB worthwhile.