In my last post[^prev], I wrote about booting VMs using QEMU and UEFI. The result was a disk image with the linux kernel and an initramfs in the ESP partition and a base debian install in a second partition with an ext4 filesystem. While I was quite happy to have figured out how to boot UEFI instead of SeaBIOS and Grub, I was quite unhappy about how complicated the EFI variable setup remain.

Linux Command Line

The main reason for having to dance with the UEFI firmware GUI in the last post is to set the kernel command-line. Recall that as long as the kernel is compiled with CONFIG_EFI_STUB, the UEFI firmware can execute the kernel as an EFI application. However, unless the entire filesystem is going to live in RAM, the kernel needs to locate a disk, mount it, switch root, and exec an init program. The process of finding the disk is not automatic; the kernel must be told where to look via the command-line[^cmd].

console=ttyS0 root=/dev/sda2 rw initrd=initrd.img

Usually a bootloader is responsible for executing the kernel with arguments like these. The root and initrd arguments are derived from how I partitioned the disk and located certain files in the filesystems.

Unified Kernel Image

Executing the kernel with a command-line works great for a physical system like my desktop, which boots using UEFI and whose variables have not changed since I first provisioned the machine. But in regard to virtual machines, how do I know what command-line arguments to set for any given disk image? I could pop open the hood, namely mount the partitions to figure out where the root filesystem and initramfs are located, but I tend to be alergic to friction when using developer tools. And sure, these steps could be programmed into the tool that creates the disk in the first place, but then I would still need to solve the problem of manipulating EFI variables programmatically.

As it turns out, there is a better way. A unified kernel image is, in my own paraphrase, a kernel image combined with a lightweight bootloader in a single file. At the end of the day, it’s not all that different from the user experience of running Grub, which I find very nice. My only gripe is how obtuse the build steps are.

The Arch wiki lists a few options for building UKIs[^uki]. The ukify tool provided by systemd works great, but I want a native solution that I can plug into a Zig program. The fact that a UKI can be created with objcopy means it is some sort of object file that can be parsed and modified. How hard can that be?

Portable Executables

UEFI applications are stored in the Portable Executable file format[^pe]. The GNU binutils programs such as objcopy and objdump are capable of reading and writing PE files. This makes getting started with the format much easier than looking at hexdump, and the Arch wiki even has the objcopy command for creating a UKI. With this starting point, I should be able to dissect the logic and recreate it in Zig code.

The objcopy command looks like

$ objcopy -add-section .cmdline="/etc/kernel/cmdline" ... /usr/lib/systemd/boot/efi/linuxx64.efi.stub linux.efi

There are a few other --add-section arguments and some --change-section-vmas that I have skipped for simplicity. The linuxx64.efi.stub file is provided by systemd-boot[^systemd-boot] and is itself a UEFI application (and PE file). This command is making a copy of linuxx64.efi.stub named linux.efi and adding a few additional sections.

$ objdump -h /usr/lib/systemd/boot/efi/linuxx64.efi.stub 

/usr/lib/systemd/boot/efi/linuxx64.efi.stub:     file format pei-x86-64

Sections:
Idx Name          Size      VMA               LMA               File off  Algn
  0 .text         0000ce1e  000000014df91000  000000014df91000  00000400  2**4
                  CONTENTS, ALLOC, LOAD, READONLY, CODE
  1 .rodata       00003ee0  000000014df9e000  000000014df9e000  0000d400  2**2
                  CONTENTS, ALLOC, LOAD, READONLY, DATA
  2 .data         00000088  000000014dfa2000  000000014dfa2000  00011400  2**4
                  CONTENTS, ALLOC, LOAD, DATA
  3 .sbat         000000f4  000000014dfa3000  000000014dfa3000  00011600  2**2
                  CONTENTS, ALLOC, LOAD, READONLY, DATA
  4 .sdmagic      00000030  000000014dfa4000  000000014dfa4000  00011800  2**2
                  CONTENTS, ALLOC, LOAD, READONLY, DATA
  5 .reloc        00000084  000000014dfa5000  000000014dfa5000  00011a00  2**2
                  CONTENTS, ALLOC, LOAD, READONLY, DATA

Theee are 6 sections already in the file, and their headers have metadata about where the binary contents are located, the size of those contents, and a few characteristics such as code vs data. All the headers are at the top of the file and are followed by all of the contents, as opposed to having each header and come right before its content.

While the .text section with the UEFI bootloader will remain untouched, the full objcopy command will add the sections .cmdline, .linux, and .initrd, among a few others. And this is the light-bulb moment if my inital description of a UKI didn’t click - the final PE file has everything inside of it that is required for a boot. The firmware simply needs to execute the UKI.

In practical terms, the beauty of a UKI means that the disk image is portable; it can be distributed and booted without having to explore its contents (okay, except for the architecture of the UEFI applications).

Implementation

My approach to this problem was to first recreate objdump. This might seem like a tangent to my actual goal of recreating objcopy, but the former lets me start with just the parsing step.

The file and section headers are pretty easy to read thanks to Zig’s GenericReader.readStruct().

    var buf: [4096]u8 = undefined;

    const f_in = try fs.cwd().openFile(input_path, .{});
    defer f_in.close();

    const in = f_in.reader();

    const dos_hdr = try in.readStruct(win32.IMAGE_DOS_HEADER);
    var nt_hdr = try in.readStruct(win32.IMAGE_NT_HEADERS64);
    const num_sections_original: usize = @intCast(nt_hdr.FileHeader.NumberOfSections);
    const len_original_sections = num_sections_original * @sizeOf(win32.IMAGE_SECTION_HEADER);

    const n = try in.readAll(buf[0..len_original_sections]);
    var sections: []win32.IMAGE_SECTION_HEADER = undefined;
    sections.ptr = @alignCast(@ptrCast(&buf));
    sections.len = num_sections_original;

From there, the code can iterate over the sections and print out the table values; it didn’t take too long to reverse-engineer the logic just by looking at the data in the section headers and some properties in the global headers.

$ ./obj dump -p /usr/lib/systemd/boot/efi/linuxx64.efi.stub 
Magic                     020b
MajorLinkerVersion        0
MinorLinkerVersion        0
SizeOfCode                000000000000ce1e
SizeOfInitializedData     0000000000004110
SizeOfUninitializedData   0000000000000000
AddressOfEntryPoint       000000000000dce0
BaseOfCode                0000000000001000
ImageBase                 000000014df90000
SectionAlignment          00001000
FileAlignment             00000200
...
$ ./obj dump -h /usr/lib/systemd/boot/efi/linuxx64.efi.stub 
Sections:
Idx Name          Size      VMA               LMA               File off  Algn
  0 .text         0000ce1e  000000014df91000  000000014df91000  00000400  0   
  1 .rodata       00003ee0  000000014df9e000  000000014df9e000  0000d400  0   
  2 .data         00000088  000000014dfa2000  000000014dfa2000  00011400  0   
  3 .sbat         000000f4  000000014dfa3000  000000014dfa3000  00011600  0   
  4 .sdmagic      00000030  000000014dfa4000  000000014dfa4000  00011800  0   
  5 .reloc        00000084  000000014dfa5000  000000014dfa5000  00011a00  0   

This isn’t perfect, as I have clearly skipped over the alignment column, but I have enough confidence that the parsing is correct. The next step was to create my own IMAGE_SECTION_HEADERs from the command-line arguments.

After that, the code is largely just copying bytes. Start with the rest of the stub file, then move on to the contents from the arguments. The full final command is

$ ./obj copy --add-section .osrel=debian --add-section .cmdline='console=ttyS0 root=/dev/sda2 rw' --add-section .uname=6.1.0-25 --add-section .initrd=@${HOME}/.cache/metalloid/initrd.img-6.1.0-25-amd64 --add-section .linux=@${HOME}/.cache/metalloid/vmlinuz-6.1.0-25-amd64 /usr/lib/systemd/boot/efi/linuxx64.efi.stub linux.efi

I won’t regurgitate all the same steps of building the disk image from the last post[^prev]. The only difference is that the contents of the ESP now has this linux.efi file at the path /EFI/BOOT/BOOTX64.EFI, which some UEFI firmware will automatically try to find and launch.

If curious, the full source is located here.

Credits

I could not have written this code without the help of some folks who wrote incredible documentation about Portable Executables. Ero Carrera created some graphs[^graphs] that annotate a hexdump of a sample PE file, and Ahmed Hesham wrote a detailed description of the Win32 structures[^0xrick]. And thanks to Jonathan Marler for the win32 Zig bindings[^bindings].