Podman and user namespaces: A marriage made in heaven

Podman , part of the libpod library, enables users to manage pods, containers, and container images. In my last article, I wrote about Podman as a more secure way to run containers . Here, I'll explain how to use Podman to run containers in separate user namespaces.

I have always thought of user namespace , primarily developed by Red Hat's Eric Biederman, as a great feature for separating containers. User namespace allows you to specify a user identifier (UID) and group identifier (GID) mapping to run your containers. This means you can run as UID 0 inside the container and UID 100000 outside the container. If your container processes escape the container, the kernel will treat them as UID 100000. Not only that, but any file object owned by a UID that isn't mapped into the user namespace will be treated as owned by "nobody" (65534, kernel.overflowuid), and the container process will not be allowed access unless the object is accessible by "other" (world readable/writable).

If you have a file owned by "real" root with permissions 660 , and the container processes in the user namespace attempt to read it, they will be prevented from accessing it and will see the file as owned by nobody.

An example

Here's how that might work. First, I create a file in my system owned by root.

$ sudo echo “Test” > /tmp/test

$ sudo # chmod 600 /tmp/test

$ sudo ls -l /tmp/test

-rw-rw----. 1 root root 8 Nov 30 07:40 /tmp/test

Next, I volume-mount the file into a container running with a user namespace map 0:100000:5000.

$ sudo podman run -ti -v /tmp/test:/tmp/test:Z --uidmap 0:100000:5000 fedora sh

# id

uid=0(root) gid=0(root) groups=0(root)

# ls -l /tmp/test

-rw-rw----. 1 nobody nobody 8 Nov 30 12:40 /tmp/test

# cat /tmp/test

cat: /tmp/test: Permission denied

The --uidmap setting above tells Podman to map a range of 5000 UIDs inside the container, starting with UID 100000 outside the container (so the range is 100000-104999) to a range starting at UID 0 inside the container (so the range is 0-4999). Inside the container, if my process is running as UID 1, it is 100001 on the host

Since the real UID=0 is not mapped into the container, any file owned by root will be treated asowned by nobody. Even if the process inside the container has CAP_DAC_OVERRIDE , it can't override this protection. DAC_OVERRIDE enables root processes to read/write any file on the system, even if the process was not owned by root or world readable or writable.

User namespace capabilities are not the same as capabilities on the host. They are namespaced capabilities. This means my container root has capabilities only within the container―really only across the range of UIDs that were mapped into the user namespace. If a container process escaped the container, it wouldn't have any capabilities over UIDs not mapped into the user namespace, including UID=0. Even if the processes could somehow enter another container, they would not have those capabilities if the container uses a different range of UIDs.

Note that SElinux and other technologies also limit what would happen if a container process broke out of the container.

Using `podman top` to show user namespaces

We have added features to podman top to allow you to examine the usernames of processes running inside a container and identify their real UIDs on the host.

Let's start by running a sleep container using our UID mapping.

$ sudo podman run --uidmap 0:100000:5000 -d fedora sleep 1000

Now run podman top :

$ sudo podman top --latest user huser

USER HUSER

root 100000

$ ps -ef | grep sleep

100000 21821 21809 0 08:04 ? 00:00:00 /usr/bin/coreutils --coreutils-prog-shebang=sleep /usr/bin/sleep 1000

Notice podman top reports that the user process is running as root inside the container but as UID 100000 on the host (HUSER). Also the ps command confirms that the sleep process is running as UID 100000.

Now let's run a second container, but this time we will choose a separate UID map starting at 200000.

$ sudo podman run --uidmap 0:200000:5000 -d fedora sleep 1000

$ sudo podman top --latest user huser

USER HUSER

root 200000

$ ps -ef | grep sleep

100000 21821 21809 0 08:04 ? 00:00:00 /usr/bin/coreutils --coreutils-prog-shebang=sleep /usr/bin/sleep 1000

200000 23644 23632 1 08:08 ? 00:00:00 /usr/bin/coreutils --coreutils-prog-shebang=sleep /usr/bin/sleep 1000

Notice that podman top reports the second container is running as root inside the container but as UID=200000 on the host.

Also look at the ps command―it shows both sleep processes running: one as 100000 and the other as 200000.

This means running the containers inside separate user namespaces gives you traditional UID separation between processes, which has been the standard security tool of Linux/Unix from the beginning.

Problems with user namespaces

For several years, I've advocated user namespace as the security tool everyone wants but hardly anyone has used. The reason is there hasn't been any filesystem support or a shifting file system.

In containers, you want to share the base image between lots of containers. The examples above use the Fedora base image in each example. Most of the files in the Fedora image are owned by real UID=0. If I run a container on this image with the user namespace 0:100000:5000, by default it sees all of these files as owned by nobody, so we need to shift all of these UIDs to match the user namespace. For years, I've wanted a mount option to tell the kernel to remap these file UIDs to match the user namespace. Upstream kernel storage developers continue to investigate and make progress on this feature, but it is a difficult problem.

Linux Containers

What are Linux containers? What is Docker? What is Kubernetes? An introduction to container terminology Podman can use different user namespaces on the same image because of automatic chowning built into

Podman and user namespaces: A marriage made in heaven

Trending Articles

[奇怪机翻组] 双梦相牵 / ふたりの夢もち [RJ01259078] [WebRip] [1080P HEVC-10Bit AAC 2.0]...

HONDA CITY VTI-S 菜單分享

#新闻拍一拍# 新的摩尔定律：黄氏定律

一如既往的痴情能否打动月瓶金蝎？ (豆瓣月亮水瓶小组)

求購按摩椅~'~

「粉红」不是霸凌辜莞允杠部落客：我爽在哪？

Intel 7-10代集成显卡驱动31.0.101.2137完整版

涉Gotbit加密货币市场操纵台男纽约被捕

臺灣法治會計學會2025年第三季研討會

不靠姊姊！張柏芝弟弟開計程車維生

关门一家亲：习远平、张澜澜、徐才厚

剑指offer——24.二叉树中和为某一值的路径

苏珊米勒日晕05.11｜狮子鼓励孩子；处女相信自己 (豆瓣 SUSAN MILLER小组)

【台積電IT卓越新戰略5】台積IT組織5年三次大調整，要靠平臺工程讓DevOps創新再加速

【日语无字】春之钟.Haru.no.kane.1985.JAP.vhsrip.NoSub.by.xiongzaixia&vivi

美籍老公不讓步李愛綺兒子念公立小學

爆杨兰兰对于朦胧一见倾心泄露亲爹习近平致命机密？【阿波罗网报道】

湖州师范学院音乐学院开发的 Kontakt 8 明代魏氏乐琵琶/瑟/月琴音源即将发布

LameXP 4.21.2382 免安裝中文版 - MP3音樂轉檔軟體

免费翻墙节点大全