Jan 15
2017

The Saruman suite of ELF manipulation tools

I’m writing an operating system and one of the interesting parts is constructing the executables. I’d like the kernel to be as small as possible, which means deferring a lot of its functionality to a separate program running with lower privileges — in user space, in other words. But this leads to a bootstrapping problem, since support for loading a user-space program is exactly the sort of complicated thing I would prefer not to add to my kernel.

My solution is to store the initial user-space program, sigma0, as part of the kernel executable. This way, whatever loads the kernel also ends up writing the user-space program into memory as well. This isn’t an original solution; the typical ways people do this are to use objcopy to convert the user-space program to a raw binary blob, or to include the user-space program verbatim and add an ELF loader to the kernel. Neither of these solutions is particularly great — raw binary blobs are by definition structureless and hard to debug, and I don’t want to add an ELF loader to the kernel. (The Linux solution, called initrd, incorporates an entire file system as a giant blob — which is fine for Linux, as it is a giant kernel and already includes filesystem-mounting and ELF-loading functionality.)

So my solution is to make one ELF file containing both kernel and user-space program. ELF files are structured into segments (among other things), so all one needs to do is construct a new ELF file containing segments both for the kernel and for the user-space program, then patch up the kernel to tell it where the user-space program should be.

To cut a long story short, that need, and a few other ELF-related requirements I had, weren’t well served by the standard toolchain. So I wrote the Saruman suite of pipeline-friendly ELF tools.

Githubhttps://github.com/nfd/saruman 
License: Public domain, as usual 

There are three tools currently.

objcat combines ELF files by producing a new file containing all the loadable segments from all input files. For example, objcat program1 program2 >program3. This obviously doesn’t produce anything interesting if the virtual addresses of the input segments overlap, but it’s useful if they don’t.

objinfo prints information from ELF files. Currently it can print the value of a symbol (objinfo -S name program) or the entrypoint of the file (objinfo -E program)

objpatch lets you overwrite specific bytes in ELF files. The interesting part is that the address of the bytes is specified as a virtual address. Objpatch will find the part of the file corresponding with the vaddr you specify. For example, objpatch -V 0x80000c00=u32:0x4000 program >out will overwrite the unsigned 32-bit integer which would load at virtual address 0x80000c00 with the value 0x4000. 

I’ll probably add more to these in future, but it’s already enough to produce the two-executables-in-one-ELF functionality I need for my kernel.

(Why Saruman and not Sauron? Because Sauron took a good thing and made it bad, but Saruman took a bad thing and made it worse.)