QEMU
I mostly work on linux software from a linux machine, and I prefer building/running/testing locally as much as possible. Even when the host and target OSes match, there are sometimes other important variables to control, namely the kernel. This is when virtualization becomes essential, and tools like Docker won’t be of much help.
The kernel
The linux kernel is often built as a statically linked executable, and it’s actually not very difficult to build.
$ git clone https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
$ cd linux
$ git checkout v5.12
$ make menuconfig
$ make bzImage
$ ls -lh arch/x86/boot/bzImage
-rw-r--r-- 1 1000 1000 9.1M May 31 13:26 arch/x86/boot/bzImage
Ok, so I have a kernel! But it won’t do much good on it’s own.
Running QEMU
Now normally the kernel is used to run a convential operating system like Arch or Debian, which are distinguished by their choices of init systems, filesystem structure, package manager, etc. But it can also execute a simple program relying on the linux APIs (or higher-level libraries like libc).
Copying the example from https://www.kernel.org/doc/html/latest/filesystems/ramfs-rootfs-initramfs.html:
$ cat > hello.c << EOF
#include <stdio.h>
#include <unistd.h>
int main(int argc, char *argv[])
{
printf("Hello world!\n");
sleep(999999999);
}
EOF
$ gcc -static hello.c -o init
$ echo init | cpio -o -H newc | gzip > test.cpio.gz
$ qemu-system-x86_64 -kernel bzImage -initrd test.cpio.gz -nographic -m 512 -append 'console=ttyS0'
You’ll see a bunch of output from the kernel log, and at the end:
[ 3.678088] Run /init as init process
Hello world!
The VM will “hang” at that sleep, but the qemu process can be stopped using kill
. The terminal may be in a different input mode, which running reset
will fix.
The filesystem
The C program is cool, and shows how the kernel might be used for embedded devices. My goal is to test software that would run on a traditional linux distro, so I need to keep going.
The next necessary piece is a filesystem. This is often located in a disk partition labelled as root, but can also come from an initramfs. As it turns out, this is not very difficult to create. The cpio archive above is essentially a single-file filesystem, where that file is mounted to /init
.
But what goes in the filesystem? Sure, it’s not hard to create a few directories, but at the minimum I want a shell and some utilities. That’s where Busybox comes in.
I put the following in a shell script named mkinitramfs.sh
:
#!/bin/bash
set -euo pipefail
if [[ "$#" != "1" ]]; then
echo "Usage: ./mkinitramfs FILE"
exit 1
fi
start_dir=$(pwd)
build_dir=$(mktemp -d)
cd ${build_dir}
mkdir -p bin sbin etc/init.d dev proc sys usr/bin usr/sbin
cp $(which busybox) bin/busybox
for f in $(busybox --list-full); do
if [[ "${f}" != "bin/busybox" ]]; then
ln -s /bin/busybox ${f}
fi
done
cat << EOF > etc/init.d/rcS
#!/bin/sh
mount -t proc none /proc
mount -t sysfs none /sys
/sbin/mdev -s
[ ! -h /etc/mtab ] && ln -s /proc/mounts /etc/mtab
[ ! -f /etc/resolv.conf ] && cat /proc/net/pnp > /etc/resolv.conf
EOF
chmod +x etc/init.d/rcS
sudo chown -R root:root .
sudo find . | sudo cpio -o --format=newc | gzip > "${start_dir}/${1}"
sudo chown $(id -u):$(id -g) "${start_dir}/${1}"
cd ${start_dir}
sudo rm -rf ${build_dir}
Now I can use this to generate a ramfs and pass it as the -initrd
to QEMU. The kernel docs talk a bit more about initial RAM disks here.
$ ./mkinitramfs.sh initramfs.cpio.gz
$ qemu-system-x86_64 -kernel bzImage -initrd initramfs.cpio.gz -nographic -m 512 -append 'console=ttyS0 rdinit=/sbin/init'
...
[ 3.789760] Run /sbin/init as init process
starting pid 65, tty '': '/etc/init.d/rcS'
[ 3.866538] mount (66) used greatest stack depth: 14760 bytes left
[ 4.061817] input: ImExPS/2 Generic Explorer Mouse as /devices/platform/i8042/serio1/input/i3
[ 4.215787] mdev (68) used greatest stack depth: 14296 bytes left
Please press Enter to activate this console.
starting pid 71, tty '': '-/bin/sh'
/ # echo 'Hello world!'
Hello world!
/ # poweroff
This time, instead of simply running a program, I am dropped into a functional shell. I can run any of the utilities provided by busybox, which even includes vi
. Check out man busybox.1
for a list of all its programs.
Moving to a disk
Depending on the specific purpose for running the VM, the setup so far might be sufficient. However, most of the time I need to install some software. The filesystem is entirely in RAM, so I’ll quickly run out of space. And, naturally, any state I change in the filesystem will not be preserved across reboots.
In the script, the build_dir
variable is just a path to an empty directory. But with a few changes it is a mounted disk image that can be modified and saved:
build_dir=$(mktemp -d)
fallocate -l 4G ${1}
mkfs.ext4 ${1}
sudo mount -o loop ${1} ${build_dir}
cd ${build_dir}
Then at the end, instead of creating an archive via
sudo find . | sudo cpio -o --format=newc | gzip > "${start_dir}/${1}"
sudo chown $(id -u):$(id -g) "${start_dir}/${1}"
I only need to sudo umount ${build_dir} && rmdir ${build_dir}
.
Lastly, the QEMU command changes slightly:
$ ./mkinitramfs.sh disk.img
$ qemu-system-x86_64 -kernel bzImage -drive file=disk.img,format=raw -nographic -m 512 -append 'console=ttyS0 root=/dev/sda rw'
...
Please press Enter to activate this console.
starting pid 74, tty '': '-/bin/sh'
/ #
Not only does the filesystem have more space, but it’s persistent as well. I can shut it down and pick up right where I left off later.
Networking
Without an internet connection, this VM will have limited utility. I almost always want to install some software or interact with the outside world.
QEMU creates a network default by default (see here), so all I have to do is tell Linux to use it.
/ # ip link set eth0 up
/ # udhcpc
udhcpc: started, v1.34.1
udhcpc: broadcasting discover
udhcpc: broadcasting select for 10.0.2.15, server 10.0.2.2
udhcpc: lease of 10.0.2.15 obtained from 10.0.2.2, lease time 86400
/ # ip addr add 10.0.2.15/24 dev eth0
/ # ip route add default via 10.0.2.2 dev eth0
/ # echo "nameserver 1.1.1.1" > /etc/resolv.conf
/ # wget http://0x1b.me -S
Connecting to 0x1b.me (104.21.24.33:80)
HTTP/1.1 301 Moved Permanently
Date: Mon, 03 Jan 2022 23:14:38 GMT
Transfer-Encoding: chunked
Connection: close
Cache-Control: max-age=3600
Expires: Tue, 04 Jan 2022 00:14:38 GMT
Location: https://0x1b.me/
wget: not an http or ftp url: https://0x1b.me/
Pretty simple!
Closing thoughts
It feels great to uncover a little bit about how to run the kernel in a minimal way and see the groundwork for building a linux distro. This is still far from a full system though, perhaps most notably missing an init system like systemd
or OpenRC
and a package manager.
To extend this process with a well-known distro’s environment, check out