FS#40815 - [linux] [util-linux] Regression, file system namespace no longer work

Attached to Project: Arch Linux
Opened by Rong (tr071) - Thursday, 12 June 2014, 00:14 GMT
Last edited by Dave Reisner (falconindy) - Thursday, 12 June 2014, 04:26 GMT
Task Type Bug Report
Category Kernel
Status Closed
Assigned To Tobias Powalowski (tpowa)
Thomas Bächler (brain0)
Dave Reisner (falconindy)
Tom Gundersen (tomegun)
Architecture x86_64
Severity Medium
Priority Normal
Reported Version
Due in Version Undecided
Due Date Undecided
Percent Complete 100%
Votes 1
Private No

Details

Description:

The "unshare" command was used to create new namespaces. But now no new ns is created.

Additional info:
* package version(s)
Linux: 3.14.6-1
Util-linux: 2.24.2-1

* config and/or log files etc.


Steps to reproduce:

>>sudo unshare -m #This should give a shell with new fs
>>mount -t tmpfs tmpfs /usr/local && mount # This command should print the mount table with "/usr/local" mounted as tmpfs

Now go to another tty and type:
>>mount

What is supposed to happen:
"/usr/local" should not show up in the second shell.

What is actually happened:
"/usr/local" showed up, indicating there is no new filesystem namespace created after "unshare -m"
This task depends upon

Closed by  Dave Reisner (falconindy)
Thursday, 12 June 2014, 04:26 GMT
Reason for closing:  Not a bug
Comment by Dave Reisner (falconindy) - Thursday, 12 June 2014, 01:08 GMT
Nah, this is working as expected. The root filesystem is mounted with MS_SHARED, so mounts in child namespaces will be propagated back into the master namespace. This is done on early boot by systemd to aid integration with containers (and has been for years).

If you don't want mounts propagating back to the master, then you can bind mount a directory on itself and make it a slave (or private). Then, mounts on the bind mounted directory will not be propagated.
Comment by Rong (tr071) - Thursday, 12 June 2014, 01:51 GMT
Thanks for the explanation. Is it possible to disable this feature in systemd?

I found it very annoying to allow the child namespace to pollute its parent, as it totally defeats the purpose of namespace. Moreover, a child could bind mount files like passwd/shadow/sshd_config or any evil stuff, thus cause a security breach. It also does not make any sense in uid namespaces, if arch is going to enable it someday.
Comment by Dave Reisner (falconindy) - Thursday, 12 June 2014, 03:17 GMT
You can recursively mark your root as private, but that isn't a supported setup.

> a child could bind mount files like passwd/shadow/sshd_config or any evil stuff, thus cause a security breach.
Well sure, an insecure setup is insecure. If you don't want this happening in a given setup, change the propagation flags. Don't think that mount namespaces are ever going to provide any real security on their own, though.
Comment by Daniel Micay (thestinger) - Thursday, 12 June 2014, 03:50 GMT
Containers where the user inside is a true superuser are not at all secure. In theory, they're meant to be secure with CLONE_NEWUSER but Arch doesn't enable it yet and it doesn't work well yet. It's trivial to get root on the host (or "exploit" the kernel) with CAP_SYS_ADMIN via privileged I/O and countless other things.
Comment by Rong (tr071) - Thursday, 12 June 2014, 04:23 GMT
>Don't think that mount namespaces are ever going to provide any real security on their own, though.

I'm not sure about this. Isn't namespace the cornerstone of the hottest projects like docker? I heard google is trying to offer docker containers to the the public, so they must have good faith in it. Namespaces(including UID_NS) are also enabled in the ChromeOS, which put quite a lot of thoughts and efforts on security.

Even there are tons of potential security bugs in the kernel, the kernel gurus will fix them upon disclosed. But systemd is intentionally creating a security problem which is trivial to take advantage of. To make things worse, the is neither warning at all when I run the "unshare", nor any way to check if rootfs is rshared.

I changed my rootfs as --make-rprivate, and so far so good. It makes me wonder if Mr Poettering is working for the NSA...

Loading...