About

I'm Nicholas FitzRoy-Dale, a software developer in Sydney, Australia. I'm interested in embedded systems and operating system research, code optimisation, 8-bit music, rock climbing, shiny things...

Personal blog

Contact

Tue
Oct 6
2009

Passing file descriptors through Unix-domain datagram sockets

This caused me no end of trouble, and Google wasn’t much help:

The example consists of two files. socketmaster.c opens a file you pass it and starts listening on a Unix-domain datagram socket. socketslave.c sends a single-character request to socketmaster’s datagram socket. Upon receipt of the request, socketmaster sends the file descriptor of the open file to socketslave’s datagram socket. Socketslave reads a small amount of data from the file, displays it, and exits. Run socketslave again and you get the next portion of the file, proving that the file pointer is being shared between the (persistent) socketmaster and the (transient) socketslave.

socketmaster.c

socketslave.c

Thanks to the following three links which contain code to do almost, but not quite, what I want: kragen-hacks; blackhats; docs.hp.com.

Sun
Sep 27
2009

Writing asynchronous device drivers with synchronous code

This post is about device drivers and owes a lot to papers written by Leonid Ryzhyk.

Most devices are asynchronous. You tell the device to do something, then some time later it lets you know that it’s finished. The typical programming model for asynchronous devices uses callbacks, something like this:

read_bytes():
	write_to_device_register(READ_DATA)

device_irq_handler():
	if(read_from_device_register(STATUS) & DATA_READY)
		...

The problem with this model is that it’s kind of hard to read. It’s particularly annoying if you are doing something which involves lots of interaction with the device, such as initialisation:

void init(int state)
{
	register_callback(init, state);
	static int counter;

	try_again:
	switch(state) {
		case 1:
			write_to_device_register(SETUP1, ...);
			break;
		case 2:
			if(there was an error) {
				counter += 1;
				if (counter < 10) {
					goto try_again;
				} else {
					die();
				}
			}
			write_to_device_register(SETUP2, ...);
			break;
		case 4:
			handle possible errors;
		default:
			panic();
	}
}

void device_irq_handler(void)
{
	if(callback registered) {
		callback();
	}
}

It would be much easier to read the code if the device behaved synchronously. Then you could just write this:

void init(void) 
{
	int counter;

	for(counter = 0; counter < 10; counter++) {
		write_to_device_register(SETUP1, ...);
		if(no error encountered);
			break;
	}

	if(error_encountered) {
		die();
	}

	write_to_device_register(SETUP2, ...);
	handle possible errors;
}

So how do you make an asynchronous device behave synchronously? There are lots of ways. The most obvious way is simply to poll the device. This may even be feasible for some devices in some situations (for example, when initialising the device), but, in general it’s not a feasible option, because the cycles consumed by polling the device could be better spent doing something (anything!) else.

The second way is to use threads. Under this approach, write_to_device_register() would go to sleep on some synchronisation primitive. The IRQ handler would wake up any thread sleeping on the synch primitive. This approach is pretty much what the Linux kernel does.

Threads have their problems: they really require a lot of thinking to make sure you get both correctness and performance out of the device driver you’re writing. For example, Linux allows multiple threads to execute device driver code simultaneously. Multi-threaded drivers are signficant source of device driver bugs.

But I wasn’t at the stage of writing concurrency bugs. I just didn’t want to manage more than one thread. So instead I abused the pre-processor to create the illusion of synchronous device operation while secretly using callbacks.

Here’s how that device init function looks using this approach:

EC_FUNCTION(init)
	static int counter;

	EC_START
	for(counter = 0; counter < 10; counter++) {
		EC_WRITE(SETUP1, ...);
		if(no_error)
			break;
	}

	if(error) {
		die();
	}

	EC_WRITE(SETUP2, ...);
	handle_possible_errors();
	EC_END
EC_FUNCTION_END

This has the same block structure as the threading version, and is pretty easy to read (once you get over the yucky pre-processor macros). But what is that tell-tale “static” doing in front of “int counter”?

That’s right, this is just syntactic sugar over a callback-based model. After preprocessing, it looks something like this:

void init(struct device_info *info)
{
	static int counter;

	static void *__label = &&label0;
	goto *__label;

label0:
	for(counter = 0; counter < 10; counter++) {
		__label = &&label1;
		device_write(SETUP1, ..., init, info);
		goto end;
label1:
		if(no_error)
			break;
	}

	if(error) {
		die();
	}

	__label = &&label2;
	device_write(SETUP2, ..., init, info);
	goto end;
label2:
	handle_possible_errors();
	__label = &&label0;
end:
	;
}

Hideous, huh? The good thing is, you only need to get this sort of thing right once, and then you don’t need to see it again.

This approach has some really good points: firstly, you get complete control over where you could be pre-empted in a very intuitive way: when you interact with the device. Any internal state manipulation has no chance of being corrupted. It’s not perfect, though: as you can see, you can’t use local variables. If you do want to make use of local variables, you need to add some code to save and restore the stack – essentially continuations.

You can download my preprocessor macros here: event_chain.h.

If you find this sort of thing interesting you should look at Leonid Ryzhyk’s publications, particularly “Dingo: taming device drivers”.

EDIT: This page gives another summary of the same technique (thanks to Benno for the link)

Sat
Aug 1
2009

Recording and replaying events with Android

For benchmarking, I wanted to be able to replay an event sequence. For example, I wanted a test to open the browser, browse to a Web page, scroll around on the page, and then return to the application launcher. Android offers lots of ways to do this, but because I like systems programming (and haven’t touched Java in years) I picked quite a low-level one: capturing and injecting Linux input device events.

Android uses the Linux input system in the standard way, which means events from the keyboard, touchpad, and so forth are available at /dev/input/eventX, where X is some number starting from 0. It is quite straightforward to record these events to a file – each event arrives as a struct input_event, which is (on Android) a 128-byte structure containing a timestamp, an event type, and a payload.

Recording events is easy. What about playing them back? I started dreaming up something that would rename the /dev/input/eventX files and create PTYs for each, but then I discovered that there is a far easier way: thanks to a feature added early in the 2.6 kernel series, you can just inject events by writing struct input_events back to the relevant files!

This ease of debugging really is an advantage of using a fully-fledged kernel rather than a custom or cut-down operating system.

Anyway, the end result is two programs, “record” and “replay”. The “record” program reads events from /dev/input/eventX and writes them to a file named /sdcard/events. The “replay” program reads this file and replays the events in real time.

The programs read from and write to /dev/input/eventX, even though /dev/uinput should in theory work – it doesn’t appear to on the ADP1.

Both programs are quite “proof of concept”, and probably only work on the HTC Dream for now, but they are very simple and easy to adapt.

Update 30/Dec/2009: Aaron made some changes to “replay” which should give much smoother results on OpenMoko phones.

injectevents.tar.gz (6k)

Update 1/Jul/2012: there have been a few questions about this recently. Nice to see it’s still being used! Here is some extra information.

To build the code:

Install the Android NDK and set it up so that ndk-build is in your path.

Then create a directory called jni/ and put Android.mk into it.

Then move the src/ directory inside the jni/ directory.

Then you can run ndk-build to build the binaries.

To install:

You need root access to your phone, then run

adb shell

to get a shell

Put record and replay somewhere you can execute them. /data/local/tmp is a good place – you can use adb push for this, eg

adb push record /data/local/tmp/record

You might then need to “chmod 555 /data/local/tmp/record” to make it executable.

Some people have emailed me asking for help on running the code from within a Java test framework. I can’t help you there I’m sorry. You would run replay, and then you would have to check the expected result somehow. Robotium for example can tell you about the state of the GUI, so if you wanted to check that, e.g., a particular GUI change had occurred, you could run replay and then investigate the GUI with Robotium.

When I used the replay code, it was for benchmarking, not testing. So I wrote my own Android apps to call “replay” and then check the state of the GUI directly.

Mon
Jul 27
2009

GOSUB with Python byte code

This blog post is about some of the terrible ways you can abuse Python byte code: specifically, by using it to implement lightweight function calls (“GOSUB”) directly in byte code. Doing this is obviously stupid and disgusting, and (spoiler) regular Python function calls are fast enough to make this hack not worth the effort. But trying it was quite interesting.

We don’t need a new namespace, or exception handling, or any of the other nice features that Python functions provide. In fact the only thing we need is support for 1) jumping to an arbitrary location and then 2) returning to wherever we came from later. That is, we need GOSUB.

Jumping to arbitrary locations is very easy, since Python bytecode has a JUMP_ABSOLUTE opcode. Unfortunately the argument to this opcode must be an offset calculated at compilation time – there is no way to change the address of a JUMP_ABSOLUTE statement from within the bytecode itself. This makes returning to wherever we came from later impossible with JUMP_ABSOLUTE – unless we only ever call the subroutine from one location, which isn’t true in general.

Fortunately, more devious people than I think about these problems: Phillip J. Eby has discovered that a certain opcode, END_FINALLY, normally used at the end of Python finally blocks, does its work by taking the address to return to from the stack. If we put the location we want to jump to on the stack and call END_FINALLY, we can jump wherever we want. Effectively, it’s a computed GOTO. Sadly, END_FINALLY was designed as part of the exception handling machinery and not as part of the dodgy-hack-supporting machinery. As a result, the code is a little more complicated than you might think – but not much.

This hack can be nicely abused to implement GOSUB, like so:

          0 LOAD_CONST               4 (12)
          3 LOAD_CONST               1 (37)
          6 SETUP_LOOP               3 (to 12)
    >>    9 JUMP_ABSOLUTE           14
         12 POP_BLOCK           
         13 RETURN_VALUE        
    >>   14 LOAD_CONST               2 (2)
         17 INPLACE_ADD         
         18 ROT_TWO             
         19 LOAD_CONST               3 (32)
         22 END_FINALLY         

Here we are calling a subroutine that adds 2 to the value on the top of the stack. First we push a value to return to after the subroutine finishes onto the stack – in this case we return to byte 12. Then we push an argument to the subroutine, in this case 37. SETUP_LOOP creates a block for looping or exception handling, and is required to make the END_FINALLY hack work. We then jump directly to the subroutine at byte 14.

In the subroutine we add 2 to whatever is at the top of the stack. Our stack then contains the return address, followed by the result of our calculation. We’re about to jump to the return address, so we want that at the top of the stack. ROT_TWO does that for us by swapping the top items at the top of the stack. We then LOAD_CONST a magic value that is part of the END_FINALLY hack – it is WHY_CONTINUE, a number that has special meaning to the CPython interpreter. It pops WHY_CONTINUE off the stack, which instructs it to continue execution at the value at the top of the stack – which is now our return value.

Execution then resumes at byte 12, which removes the block we created with SETUP_LOOP, and then returns the value at the top of the stack, which is the return code from our subroutine. In this case the function returns the value 39.

There are some problems with this approach:

  • POP_BLOCK, in addition to removing a block from the block stack, also returns the stack level to the level that it was when SETUP_LOOP was called. So if the subroutine returns any values on the stack you have to be careful. One simple way to deal with this is to have subroutines that take as many arguments as they return (i.e. 1).
  • BREAK_LOOP now has useless (but well-defined) behaviour: it acts as if the subroutine has immediately returned, but doesn’t provide any way of letting the caller know what happened. BREAK_LOOP still works if the subroutine sets up an actual (real) loop by pushing another block to the block stack. It should actually be possible to “fix” this by having SETUP_LOOP target an exception-handling block, essentially giving the subroutine a choice of two different locations to return to – taking the nastiness to the next level!
  • The CPython block stack is only 20 levels deep, which is pretty reasonable if you think about it as being equivalent to 20 levels of indentation. :) This means that as things currently stand, subroutine calls can’t be nested very far. There is a very easy way to fix this one, though.
  • All this is an incredibly seedy hack-on-top-of-a-hack. It relies on a side-effect of END_FINALLY that is only guaranteed to work in the canonical C implementations of Python, and even then only versions 2.3 and upward.

Adding an extra opcode would make this much nicer. There was some talk back in 2006 about implementing a JUMP_TOP opcode, which would simply jump to the bytecode position at the top of the stack. Unfortunately, the motivation for adding it was to add a “switch” statement to Python, and nowadays that doesn’t seem very likely.

Fortunately, there is no real motivation to use this hack. Running a simple test (10000000 iterations, on a 2.2Ghz MacBook) gives the following results:

MethodTime in seconds
GOSUB2.78
CALL_FUNCTION, hand-written3.00
CALL_FUNCTION, Python3.67

In these results, “GOSUB” is the hand-written GOSUB hack discussed above, and the other two use the regular Python function call mechanism: one with hand-written bytecode, and the other written in regular Python and compiled in the normal way. Even with ten million calls there isn’t that huge a speed improvement to make GOSUB worthwhile.

Mon
Jul 27
2009

Taking screenshots with an Android ADP1

There are lots of tutorials around the ‘net to take screenshots using DDMS. Unfortunately using DDMS makes it difficult to take automated screenshots for testing purposes.

Fortunately Android uses reasonably-standard Unix conventions for access to devices, so you can read the data from /dev/graphics/fb0. The full procedure for getting a screenshot is:

adb pull /dev/graphics/fb0 fb0
python rgb565torgb888.py <fb0 >fb0.888
convert -depth 8 -size 320x480 RGB:fb0.888 fb0.png

Imagemagick (and thus convert) doesn’t seem to deal with the 16-bit colour (565) image format, so I wrote a program to expand the data to 24-bit (888) before passing it to convert. You can use mine (rgb565torgb888.py, 311 bytes) or any other one you may find out there. :)

Update: If you’re using Python 3, here is a version which will work for you. Use it without redirection, like this:

python3 rgb565torgb888_py3.py fb0 fb0.888

Newer entries