Building UKIs
In my last post[^prev], I wrote about booting VMs using QEMU and UEFI. The result was a disk image with the linux kernel and an initramfs in the ESP partition and a base debian install in a second partition with an ext4 filesystem. While I was quite happy to have figured out how to boot UEFI instead of SeaBIOS and Grub, I was quite unhappy about how complicated the EFI variable setup remain.
Linux Command Line
The main reason for having to dance with the UEFI firmware GUI in the last post
is to set the kernel command-line. Recall that as long as the kernel is
compiled with CONFIG_EFI_STUB
, the UEFI firmware can execute the kernel as an
EFI application. However, unless the entire filesystem is going to live in RAM,
the kernel needs to locate a disk, mount it, switch root, and exec an init
program. The process of finding the disk is not automatic; the kernel must be
told where to look via the command-line[^cmd].
console=ttyS0 root=/dev/sda2 rw initrd=initrd.img
Usually a bootloader is responsible for executing the kernel with arguments like
these. The root
and initrd
arguments are derived from how I partitioned the
disk and located certain files in the filesystems.
Unified Kernel Image
Executing the kernel with a command-line works great for a physical system like my desktop, which boots using UEFI and whose variables have not changed since I first provisioned the machine. But in regard to virtual machines, how do I know what command-line arguments to set for any given disk image? I could pop open the hood, namely mount the partitions to figure out where the root filesystem and initramfs are located, but I tend to be alergic to friction when using developer tools. And sure, these steps could be programmed into the tool that creates the disk in the first place, but then I would still need to solve the problem of manipulating EFI variables programmatically.
As it turns out, there is a better way. A unified kernel image is, in my own paraphrase, a kernel image combined with a lightweight bootloader in a single file. At the end of the day, it’s not all that different from the user experience of running Grub, which I find very nice. My only gripe is how obtuse the build steps are.
The Arch wiki lists a few options for building UKIs[^uki]. The ukify
tool
provided by systemd
works great, but I want a native solution that I can plug
into a Zig program. The fact that a UKI can be created with objcopy
means
it is some sort of object file that can be parsed and modified. How hard can
that be?
Portable Executables
UEFI applications are stored in the Portable Executable file format[^pe]. The
GNU binutils programs such as objcopy
and objdump
are capable of reading and
writing PE files. This makes getting started with the format much easier than
looking at hexdump
, and the Arch wiki even has the objcopy
command for
creating a UKI. With this starting point, I should be able to dissect the logic
and recreate it in Zig code.
The objcopy
command looks like
$ objcopy -add-section .cmdline="/etc/kernel/cmdline" ... /usr/lib/systemd/boot/efi/linuxx64.efi.stub linux.efi
There are a few other --add-section
arguments and some --change-section-vma
s
that I have skipped for simplicity. The linuxx64.efi.stub
file is provided
by systemd-boot[^systemd-boot] and is itself a UEFI application (and PE file).
This command is making a copy of linuxx64.efi.stub
named linux.efi
and
adding a few additional sections.
$ objdump -h /usr/lib/systemd/boot/efi/linuxx64.efi.stub
/usr/lib/systemd/boot/efi/linuxx64.efi.stub: file format pei-x86-64
Sections:
Idx Name Size VMA LMA File off Algn
0 .text 0000ce1e 000000014df91000 000000014df91000 00000400 2**4
CONTENTS, ALLOC, LOAD, READONLY, CODE
1 .rodata 00003ee0 000000014df9e000 000000014df9e000 0000d400 2**2
CONTENTS, ALLOC, LOAD, READONLY, DATA
2 .data 00000088 000000014dfa2000 000000014dfa2000 00011400 2**4
CONTENTS, ALLOC, LOAD, DATA
3 .sbat 000000f4 000000014dfa3000 000000014dfa3000 00011600 2**2
CONTENTS, ALLOC, LOAD, READONLY, DATA
4 .sdmagic 00000030 000000014dfa4000 000000014dfa4000 00011800 2**2
CONTENTS, ALLOC, LOAD, READONLY, DATA
5 .reloc 00000084 000000014dfa5000 000000014dfa5000 00011a00 2**2
CONTENTS, ALLOC, LOAD, READONLY, DATA
Theee are 6 sections already in the file, and their headers have metadata about where the binary contents are located, the size of those contents, and a few characteristics such as code vs data. All the headers are at the top of the file and are followed by all of the contents, as opposed to having each header and come right before its content.
While the .text
section with the UEFI bootloader will remain untouched, the
full objcopy
command will add the sections .cmdline
, .linux
, and
.initrd
, among a few others. And this is the light-bulb moment if my inital
description of a UKI didn’t click - the final PE file has everything inside of
it that is required for a boot. The firmware simply needs to execute the UKI.
In practical terms, the beauty of a UKI means that the disk image is portable; it can be distributed and booted without having to explore its contents (okay, except for the architecture of the UEFI applications).
Implementation
My approach to this problem was to first recreate objdump
. This might seem
like a tangent to my actual goal of recreating objcopy
, but the former lets
me start with just the parsing step.
The file and section headers are pretty easy to read thanks to Zig’s
GenericReader.readStruct()
.
var buf: [4096]u8 = undefined;
const f_in = try fs.cwd().openFile(input_path, .{});
defer f_in.close();
const in = f_in.reader();
const dos_hdr = try in.readStruct(win32.IMAGE_DOS_HEADER);
var nt_hdr = try in.readStruct(win32.IMAGE_NT_HEADERS64);
const num_sections_original: usize = @intCast(nt_hdr.FileHeader.NumberOfSections);
const len_original_sections = num_sections_original * @sizeOf(win32.IMAGE_SECTION_HEADER);
const n = try in.readAll(buf[0..len_original_sections]);
var sections: []win32.IMAGE_SECTION_HEADER = undefined;
sections.ptr = @alignCast(@ptrCast(&buf));
sections.len = num_sections_original;
From there, the code can iterate over the sections
and print out the table
values; it didn’t take too long to reverse-engineer the logic just by looking
at the data in the section headers and some properties in the global headers.
$ ./obj dump -p /usr/lib/systemd/boot/efi/linuxx64.efi.stub
Magic 020b
MajorLinkerVersion 0
MinorLinkerVersion 0
SizeOfCode 000000000000ce1e
SizeOfInitializedData 0000000000004110
SizeOfUninitializedData 0000000000000000
AddressOfEntryPoint 000000000000dce0
BaseOfCode 0000000000001000
ImageBase 000000014df90000
SectionAlignment 00001000
FileAlignment 00000200
...
$ ./obj dump -h /usr/lib/systemd/boot/efi/linuxx64.efi.stub
Sections:
Idx Name Size VMA LMA File off Algn
0 .text 0000ce1e 000000014df91000 000000014df91000 00000400 0
1 .rodata 00003ee0 000000014df9e000 000000014df9e000 0000d400 0
2 .data 00000088 000000014dfa2000 000000014dfa2000 00011400 0
3 .sbat 000000f4 000000014dfa3000 000000014dfa3000 00011600 0
4 .sdmagic 00000030 000000014dfa4000 000000014dfa4000 00011800 0
5 .reloc 00000084 000000014dfa5000 000000014dfa5000 00011a00 0
This isn’t perfect, as I have clearly skipped over the alignment column, but I
have enough confidence that the parsing is correct. The next step was to create
my own IMAGE_SECTION_HEADER
s from the command-line arguments.
After that, the code is largely just copying bytes. Start with the rest of the stub file, then move on to the contents from the arguments. The full final command is
$ ./obj copy --add-section .osrel=debian --add-section .cmdline='console=ttyS0 root=/dev/sda2 rw' --add-section .uname=6.1.0-25 --add-section .initrd=@${HOME}/.cache/metalloid/initrd.img-6.1.0-25-amd64 --add-section .linux=@${HOME}/.cache/metalloid/vmlinuz-6.1.0-25-amd64 /usr/lib/systemd/boot/efi/linuxx64.efi.stub linux.efi
I won’t regurgitate all the same steps of building the disk image from the last
post[^prev]. The only difference is that the contents of the ESP now has this
linux.efi
file at the path /EFI/BOOT/BOOTX64.EFI
, which some UEFI firmware
will automatically try to find and launch.
If curious, the full source is located here.
Credits
I could not have written this code without the help of some folks who wrote incredible documentation about Portable Executables. Ero Carrera created some graphs[^graphs] that annotate a hexdump of a sample PE file, and Ahmed Hesham wrote a detailed description of the Win32 structures[^0xrick]. And thanks to Jonathan Marler for the win32 Zig bindings[^bindings].