Unprivileged Linux Network Namespaces, Part 1
Linux’s network namespaces are the coolest thing since Windows Vista. Ok that’s hardly a fair comparison, but I am talking about a feature that was introduced to Linux 2.6.24 which shipped in January of 2008, roughly one year after Vista was released. One of these things is still very relevant today.
The kernel has a man page1 with a concise description of the feature. I like this sentence from man ip-netns
2:
A network namespace is logically another copy of the network stack, with its own routes, firewall rules, and network devices.
What does that mean?
Linux namespaces are similar to namespaces in a programming language - they group a set of related resources and isolate them from another group. Two identical entites can exist in separate namespaces without causing a conflict. Container tools like Docker or Podman use a network namespace to isolate a containerized process from the host’s network, though they also usually link to the host for internet access.
Perhaps the most common way to create a network namespace is through iproute2
:
$ ip l
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: end0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000
link/ether 72:42:40:55:08:0b brd ff:ff:ff:ff:ff:ff
$ sudo ip netns add asdf
$ sudo ip -n asdf l # shorthand for `ip netns exec asdf ip l`
1: lo: <LOOPBACK> mtu 65536 qdisc noop state DOWN mode DEFAULT group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
Inside the namespace (note the -n asdf
), my computer’s ethernet device no longer exists, and the loopback device is different (notice the state change).
How does it work?
Go see for yourself! The source code3 is not too difficult to follow.
The syscall that creates a namespace is called unshare()
4. The Linux docs have an entire page5 dedicated to it. There are many types of namespaces, but the network ones are created with the CLONE_NEWNET
flag. Each namespace is a copy of the kernel’s struct net
defined at the top of net_namespace.h[net_namespace.h], which stores network devices among many other fields.
After creating the namespace, iproute2
effectively stores a reference to it by mount()
ing the current process’ namespace handle to the filesystem. More specifically, it mounts /proc/self/ns/net
to /var/run/netns/<name>
. Otherwise the namespace would cease to exist as soon as the ip netns add <name>
command exited.
$ ls -lh /var/run
lrwxrwxrwx 1 root root 4 Apr 15 15:59 /var/run -> /run
$ ls /run/netns/
asdf
$ mount | rg netns
tmpfs on /run/netns type tmpfs (rw,nosuid,nodev,noexec,relatime,size=395652k,mode=755,inode64)
nsfs on /run/netns/asdf type nsfs (rw)
nsfs on /run/netns/asdf type nsfs (rw)
So far I have only mentioned how to create a namespace, but do ip -n <name> ...
and ip netns exec <name> ...
work? For that, we have to use another syscall: setns()
6. It shares the meaning of the flags
with unshare
, but also accepts a file descriptor. Go figure - in Linux you reference a namespace with a file. This file descriptor is obtained by simply opening the file that was previously mounted, i.e. /run/netns/asdf
.
setns
essentially moves a thread to a different namespace. Combine that with one of the exec*
syscalls and you can execute a different program in the namespace.
Broader use of unshare
I’ve covered the basics using iproute2
as an example, but there are more tools that make use of Linux’s namespace feature. The util-linux repo has another program that calls unshare
: unshare.c7.
$ sudo unshare -n
# ip l
1: lo: <LOOPBACK> mtu 65536 qdisc noop state DOWN mode DEFAULT group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
# exit
$
Unlike iproute2
, this program does not persist the namespace; it only executes a program inside it (which is a shell by default).
You’ve probably noticed that I have used sudo
for creating these namespaces, and that’s because CLONE_NEWNET
requires CAP_SYS_ADMIN
. Well unshare
-the-program supports all the other kinds of namespaces too. For example:
$ whoami
jordan
$ unshare -U
$ whoami
nobody
When creating a user namespace, the program has a handy feature where it can map the current user’s UID to the root user inside the namespace. This is privilege escalation stuck inside a namespace separate from the rest of the system. I wouldn’t quite call it a sandbox, but it does give you permissions inside the namespace you wouldn’t otherwise have in the root/default namespace. I rhetorically wonder if this solves the permission problem with network namespaces?
$ unshare -Urn
$ whoami
root
So with the combination of CLONE_NEWUSER
and CLONE_NEWNET
, which are OR’d together in the flags
argument to unshare
, you can create a network namespace without your user having CAP_SYS_ADMIN
. This is my go-to method for quickly tinkering with network config, e.g. exploring BPF programs or nftables rules.
The downside of unshare -Urn
is that the namespace isn’t easily referenced by other processes. For example, when using iproute2
, I can ip netns exec <name>
to open up several shells, which is handy for running tcpdump
or bpf prog tracelog
. The unshare
program can’t help me with this.
Broader use of setns
But going back to the basics, I know that the new namespace my shell is running in can be referenced by /proc/self/ns/net
. And /proc/self
is a symlink to a directory named with the process ID:
$ ls -lh /proc/self
lrwxrwxrwx 1 root root 0 Dec 31 1969 /proc/self -> 212116
So I should be able to enter the shell’s namespaces via
int fd = open("/proc/212116/ns/user", O_RDONLY, 0);
setns(fd, CLONE_NEWUSER);
close(fd);
fd = open("/proc/212116/ns/net", O_RDONLY, 0);
setns(fd, CLONE_NEWNET);
close(fd);
Turns out, the util-linux repo has a program called nsenter8 to do exactly that.
$ nsenter -U -n --preserve-credentials -t 212116
$ whoami
root
The UX of this flow is lame, but it does prove I can recreate ip netns exec
without elevated privileges.
-
https://man7.org/linux/man-pages/man7/network_namespaces.7.html ↩︎
-
https://git.kernel.org/pub/scm/network/iproute2/iproute2.git/tree/ip/ipnetns.c?h=v6.4.0 ↩︎
-
https://git.kernel.org/pub/scm/utils/util-linux/util-linux.git/tree/sys-utils/unshare.c?h=v2.39 ↩︎
-
https://git.kernel.org/pub/scm/utils/util-linux/util-linux.git/tree/sys-utils/nsenter.c?h=v2.39 ↩︎