Arch Linux

Please read this before reporting a bug:
https://wiki.archlinux.org/index.php/Reporting_Bug_Guidelines

Do NOT report bugs when a package is just outdated, or it is in Unsupported. Use the 'flag out of date' link on the package page, or the Mailing List.

REPEAT: Do NOT report bugs for outdated packages!
Tasklist

FS#36969 - [linux] 3.13 add CONFIG_USER_NS

Attached to Project: Arch Linux
Opened by Florian Klink (flokli) - Tuesday, 17 September 2013, 20:55 GMT
Last edited by Dave Reisner (falconindy) - Monday, 29 June 2015, 20:52 GMT
Task Type Feature Request
Category Packages: Core
Status Closed
Assigned To Tobias Powalowski (tpowa)
Thomas Bächler (brain0)
Architecture All
Severity Low
Priority Normal
Reported Version
Due in Version Undecided
Due Date Undecided
Percent Complete 100%
Votes 49
Private No

Details

Description: Add user namespaces to kernel configuration:


Support user namespaces. This allows containers, i.e. vservers, to use user namespaces to provide different user info for different servers.


This is recommended to turned on when using lxc. (lxc-checkconfig complains about it)
Its also needed to be able to run commands inside an lxc container while using virsh:

Latest libvirt has a new command for running stuff inside a container

virsh -c lxc:/// lxc-enter-namespace mycontainername -- /bin/ps -auxf

This requires a fairly new kernel(3.7 or even 3.8 kernel is preferred)
since it _needs all 6 namespaces present in /proc/self/ns to work properly_.
(from https://www.redhat.com/archives/libvirt-users/2013-February/msg00058.html)


Seems like this option got missed while closing  FS#16715 
This task depends upon

Closed by  Dave Reisner (falconindy)
Monday, 29 June 2015, 20:52 GMT
Reason for closing:  Won't implement
Additional comments about closing:  The situation is not significantly different from when this bug was originally closed. There are still multiple vulnerabilities every month and upstream is still not providing a way to enable this at runtime via a sysctl flag. Arch is not going to carry an out-of-tree patch to add the sysctl flag like other distributions, so I don't think it makes sense to leave this issue open. The feature is not going to be ready for years.
Comment by Gerardo Exequiel Pozzi (djgera) - Tuesday, 17 September 2013, 23:43 GMT
This can not be done now with 3.11, because can not be enabled if XFS is present). Must wait for 3.12 (see commit d6970d4b726cea6d7a9bc4120814f95c09571fc3 [enable building user namespace with xfs] and related commits 300893b08f3bc7057a7a5f84074090ba66c8b5ca [Merge tag 'xfs-for-linus-v3.12-rc1' of git://oss.sgi.com/xfs/xfs])
Comment by Leonid Isaev (lisaev) - Tuesday, 01 October 2013, 16:39 GMT
A related Fedora bug: https://bugzilla.redhat.com/show_bug.cgi?id=917708 . Please note that there seem to be some security implications of enabling user namespaces...
Comment by Florian Klink (flokli) - Tuesday, 01 October 2013, 16:49 GMT
What about taking their approach, reverting http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=5eaf563e53294d6696e651466697eb9d491f3946 "userns: Allow unprivileged users to create user namespaces"?
Comment by William Kennington (Webhostbudd) - Sunday, 06 October 2013, 03:55 GMT
I agree with Florian, allowing non-root users to take advantage of elevating themselves to a local root seems like a huge attack surface. Preferably this would be a sysctl with a huge warning attached to it when it is switched on.
Comment by Tobias Powalowski (tpowa) - Thursday, 10 October 2013, 09:38 GMT
I have not enabled USER_NS in configs for 3.12, until security is not safe.
Comment by Leonid Isaev (lisaev) - Monday, 04 November 2013, 22:56 GMT
@Webhostbudd:
Allowing non-admin users to create namespaces is one of goals of the whole "user namespace" work. For instance, Ubuntu plans to be able to deploy unprivileged containers in 14.04 [1], [2].

[1] http://s3hh.wordpress.com/2013/02/12/user-namespaces-lxc-meeting/
[2] https://bugs.launchpad.net/ubuntu/+source/lxc/+bug/1191596

@tpowa:
Well, there are no outstanding security issues with user namespaces which I'm aware of. However, the above commit has already produced at least 2 serious vulnerabilities, so I guess people at Fedora security decided to play it safe (and I agree with them). I suggest to delay enabling user namespaces by default until at least 3.13. Is it possible to rename this bug report to say "3.13" and keep it open?
Comment by Daniel Micay (thestinger) - Thursday, 16 January 2014, 02:50 GMT
This is a very useful feature even without allowing unprivileged users to use it, so I think Arch should be enabling it and reverting the patch removing the need to be a superuser like Fedora.
Comment by William Kennington (Webhostbudd) - Wednesday, 19 February 2014, 04:42 GMT
@lisaev
I never said that it wasn't a neat idea, I think it is awesome. The problem is that it hardly has any testing and does expose a much larger section of the kernel to user torture than was previously available before. This is a very major change to many kernel subsystems and has already enabled new attacks. I'm not saying it's something that shouldn't ever be enabled, it just needs time to bake.
Comment by Daniel Micay (thestinger) - Tuesday, 22 April 2014, 20:34 GMT
It doesn't appear that user namespaces are very useful with the new restrictions... as far as I can tell they can not yet be mixed with a chroot in any way.
Comment by Damjan Georgievski (damjan) - Friday, 24 October 2014, 08:50 GMT
Revisit this issue?

CentOS 7, Debian 7.x, Ubuntu 14.04 all have USER_NS enabled in their kernels
Comment by Kenton Varda (kentonv) - Sunday, 23 November 2014, 19:31 GMT
Note that Sandstorm.io does not work on Arch because of this:

https://github.com/sandstorm-io/sandstorm/issues/162

That said, there is a legitimate security issue as userns opens up a large new attack surface for local privilege-escalation exploits, and there have indeed been a few vulnerabilities discovered over the last few months (e.g. CVE-2014-5206/CVE-2014-5207).

Of course, for many installations -- especially single-user desktops and servers that run only trusted code* -- local privilege escalation may not be a big issue.

Debian and Ubuntu have the kernel.unprivileged_userns_clone sysctl to control access to this feature, which they default off (last I checked), but the Sandstorm installer assists the user in turning it on. Arch could do the same if you want to be cautious.

* Sandstorm is a server that runs untrusted code, but it already uses seccomp to prevent untrusted code from creating user namespaces.
Comment by Daniel Micay (thestinger) - Monday, 24 November 2014, 03:55 GMT
> Of course, for many installations -- especially single-user desktops and servers that run only trusted code* -- local privilege escalation may not be a big issue.

That's not at all true. A local privilege escalation in the kernel is the key to escaping from the Chromium sandbox, or escalating privileges after exploiting a server / other service running as an unprivileged user.

> Debian and Ubuntu have the kernel.unprivileged_userns_clone sysctl to control access to this feature, which they default off (last I checked), but the Sandstorm installer assists the user in turning it on. Arch could do the same if you want to be cautious.

Arch doesn't add new features via patches. If you want to see this feature enabled, then land something like this upstream. Note that CONFIG_USER_NS is already enabled in the linux-grsec package because it fully removes the ability to have unprivileged user namespaces.
Comment by Kenton Varda (kentonv) - Monday, 24 November 2014, 05:37 GMT
> That's not at all true. A local privilege escalation in the kernel is the key to escaping from the Chromium sandbox,

Sure, let's talk about Chrome, because it's actually pretty relevant.

Chrome's sandbox uses seccomp to prohibit exotic kernel features, so enabling unprivileged user namespaces has no immediate effect on Chrome's security.

Meanwhile Chrome currently relies on a setuid binary to set up its sandbox, because unshare() require privilege -- unless, of course, unprivileged user namespaces are allowed. So presumably Chrome will start using userns at some point so that it can get rid of the setuid binary which is itself a security liability. So in the long run, enabling unprivileged user namespaces is actually a security win for Chrome.

> or escalating privileges after exploiting a server / other service running as an unprivileged user.

We could have a lengthy debate about the practical usefulness of UID separation in the presence of RCEs, but I think it's beside the point here, so I withdraw the statement.

> Arch doesn't add new features via patches. If you want to see this feature enabled, then land something like this upstream.

Sorry, to be clear, I only commented here to provide what I thought might be useful information for this bug. I'm not asking for anything, nor can I volunteer for anything.

> Note that CONFIG_USER_NS is already enabled in the linux-grsec package because it fully removes the ability to have unprivileged user namespaces.

Most things that use user namespaces use them explicitly because they don't require privilege. E.g. Sandstorm (like Chrome) would rather not rely on a setuid binary for sandboxing.
Comment by Daniel Micay (thestinger) - Monday, 24 November 2014, 06:06 GMT
> Chrome's sandbox uses seccomp to prohibit exotic kernel features, so enabling unprivileged user namespaces has no immediate effect on Chrome's security.

That's not true. It allows calling clone without parameter checks in some of the sandboxed processes. It doesn't allow calling it in the renderer process.

> Meanwhile Chrome currently relies on a setuid binary to set up its sandbox, because unshare() require privilege -- unless, of course, unprivileged user namespaces are allowed. So presumably Chrome will start using userns at some point so that it can get rid of the setuid binary which is itself a security liability. So in the long run, enabling unprivileged user namespaces is actually a security win for Chrome.

This doesn't make much sense. A small setuid binary is way saner than a completely broken kernel feature with a vulnerability discovered every other week. Please do a quick search for user namespaces in the kernel log. AFAICT, there has never been a disclosed privesc vulnerability for the chrome-sandbox helper. There was a potentially exploitable bug[1] but in the end it didn't appear to offer a way to escalate privileges.

[1] https://code.google.com/p/chromium/issues/detail?id=76542

> Sorry, to be clear, I only commented here to provide what I thought might be useful information for this bug. I'm not asking for anything, nor can I volunteer for anything.

If a way to disable unprivileged user namespaces by default doesn't land upstream, then Arch is not going to enable the feature. The ability to opt-in to this insanity is fine, but I can't see it being enabled by default.

> Most things that use user namespaces use them explicitly because they don't require privilege. E.g. Sandstorm (like Chrome) would rather not rely on a setuid binary for sandboxing.

Exactly, it's not actually a useful feature. The reason people want it is the ability to enter containers without root, but the feature doesn't actually provide that. Enabling it makes every user with access to clone / unshare (without a parameter check) into a superuser. Thanks to the lag in getting new kernel versions into [core] there would usually be a usable user namespace exploit available. In fact, there's one *right now* and it's not even fixed in 3.17.4 in [testing] because no fix has been created.
Comment by Kenton Varda (kentonv) - Monday, 24 November 2014, 06:23 GMT
> That's not true. It allows calling clone without parameter checks

Ouch, really? That seems like a bug in Chrome. Has someone reported it?

(Sandstorm does not allow CLONE_USERNS in clone() calls.)

> a completely broken kernel feature

I take it you have an opinion about this.

> vulnerability discovered every other week

To be fair, on the off weeks, non-userns vulnerabilities are found. Anyone relying on the lack of LPE in Linux, without using seccomp, is not in a great place, sadly.

> In fact, there's one *right now*

There's also at least one unpatched non-userns LPE *right now*, that I know of. Just saying. :)

Anyway, you've made your position clear. I guess this issue should be closed again?
Comment by Daniel Micay (thestinger) - Monday, 24 November 2014, 06:41 GMT
> Ouch, really? That seems like a bug in Chrome. Has someone reported it?

It's not a bug. I'm not aware of a common seccomp sandbox that's locked down as much as it could be. It would be nice if it was done by tracing the code and identifying the minimal set of system calls and flag parameters but it doesn't work that way. For one thing, the necessary system calls / flags vary across platforms so it's not as simple as it seems. The current rules are created by fallible humans who are going to miss many opportunities to lock down specific system calls.

Originally, Chromium was just using seccomp for the renderer process but it is going to be extended to the other processes outside of that most restricted sandbox over time.

> To be fair, on the off weeks, non-userns vulnerabilities are found. Anyone relying on the lack of LPE in Linux, without using seccomp, is not in a great place, sadly.

Sure, and I'm against moving in the wrong direction. New features known to add significant attack surfaces should be opt-in at runtime. The BPF JIT compiler is a nice example of that because it's disabled by default, as it should be.
Comment by Daniel Micay (thestinger) - Monday, 24 November 2014, 06:56 GMT
I would like it if this feature was available because I have use cases for it, but it definitely shouldn't be enabled by default. I don't think upstream is going to admit that the feature is so broken, so I don't see them adding a way to opt-in at runtime. The people working on it are still convinced that they can make it quite robust in the near future, despite all of the evidence to the contrary.

Arch *could* use the out-of-tree patch to make it opt-in, but I can't see that happening. It goes against the patching policy and I don't think the kernel maintainers are interested in deviating from it for this. Anyway, this is the frustrating side to using software as shipped by upstream.
Comment by Daniel Micay (thestinger) - Monday, 24 November 2014, 06:59 GMT
Another good way for upstream to handle this would be adding a capability for entering user namespaces, so sandboxes could be marked with a USERNS capability instead of setuid / CAP_SYS_ADMIN.
Comment by Paul Colomiets (pc) - Monday, 24 November 2014, 09:16 GMT
Well, actually AFAICS its enabled in Ubuntu by default. And there is no `kernel.unprivileged_userns_clone` flag in 14.04. Is it debian flag?

> The reason people want it is the ability to enter containers without root, but the feature doesn't actually provide that.

Why not? I've written the tool: https://github.com/tailhook/vagga which allows just that, without any single setuid binary.

> Enabling it makes every user with access to clone / unshare (without a parameter check) into a superuser.

Well you can't unshare into super-user. You become pseudo super-user in new namespace, but that super-user can mount and that's pretty much it. You can't setuid to real root, even if you have a setuid binary for it (and that's the reason FUSE doesn't work in namespace)
Comment by Daniel Micay (thestinger) - Monday, 24 November 2014, 09:38 GMT
> Why not? I've written the tool: https://github.com/tailhook/vagga which allows just that, without any single setuid binary.

The statement loses meaning when you quote it out of context... CONFIG_USER_NS=y turns all users into superusers because it doesn't work as intended. The kernel wasn't written with user namespaces in mind so there's an endless stream of privilege escalation issues via user namespaces. There are no doubt going to be a a hundred more disclosed over the next few years.

> Well you can't unshare into super-user. You become pseudo super-user in new namespace, but that super-user can mount and that's pretty much it. You can't setuid to real root, even if you have a setuid binary for it (and that's the reason FUSE doesn't work in namespace)

I'm aware of how user namespaces are intended to work. The fact is that they don't actually work that way in practice, as many parts of the kernel weren't written with them in mind.
Comment by Daniel Micay (thestinger) - Monday, 24 November 2014, 09:53 GMT
Why not wait until the feature is mature before enabling it? After 6 months with no disclosed USERNS local root exploits, I'll support enabling it. If you really think the feature is ready for the real world then that shouldn't be a problem, but IMO it means it wouldn't be enabled for 5-10 years.
Comment by Daurnimator (daurnimator) - Friday, 17 April 2015, 05:30 GMT Comment by Daniel Micay (thestinger) - Friday, 17 April 2015, 12:14 GMT
Pointing out that a bunch of vulnerabilities were recently closed is hardly evidence that it's safe to enable. There were *more* vulnerabilities discovered since that commit anyway...
Comment by Daurnimator (daurnimator) - Monday, 20 April 2015, 01:11 GMT
I was pointing out the most recent raft of fixes I know about. Which occurred 4 months ago.
You yourself said with 6 months you'd be okay turning it on; which is only 2 months away :)
Comment by Daniel Micay (thestinger) - Monday, 20 April 2015, 01:35 GMT
The most recent set of fixes landed yesterday (April 18th) and includes a fix for a vulnerability made public in October 2014. There were issues discovered in between the fixes from Eric Biederman anyway.

The Apport and Abrt vulnerabilities disclosed on April 14th were only exploitable by unprivileged users due to unprivileged user namespaces. It's going to be some time before userspace is ready for the implications of the feature, let alone the kernel itself.

We're at 2 days right now and the odds that it'll make it 6 months with no vulnerabilities discovered are pretty low... CONFIG_USER_NS is pretty much CONFIG_PRIVILEGE_ESCALATION without patching away the ability for unprivileged users to use it.
Comment by Goekcen (gokcen) - Sunday, 26 April 2015, 20:02 GMT
Why is so difficult to add the patch which makes it a sysctl flag? Right now, there is a kernel package in AUR, which just enables the CONFIG_USER_NS flag: https://aur.archlinux.org/packages/linux-user-ns-enabled/ . Using nice virtualization software like vagga (https://vagga.readthedocs.org/en/latest/) is a pain in Arch now.
Comment by Daniel Micay (thestinger) - Tuesday, 12 May 2015, 01:14 GMT
@gokcen: I think that would be fine, but it's at the discretion of the packager. Arch typically doesn't apply downstream patches other than backports.

As for enabling it *without* that... it has now been 2 days since the last fix:

https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=51dfcb076d1e1ce7006aa272cb7c4514740c7e47
Comment by Daniel Micay (thestinger) - Tuesday, 12 May 2015, 01:15 GMT
"fix" as in a serious security vulnerability closed, not just a bug
Comment by Daniel Micay (thestinger) - Friday, 29 May 2015, 16:44 GMT
This one doesn't really count, since it's not a domain-specific bug:

http://www.openwall.com/lists/oss-security/2015/05/29/5

It's the consequence of ever-increasing complexity though.
Comment by Daurnimator (daurnimator) - Tuesday, 16 June 2015, 04:30 GMT Comment by John C (tancrackers) - Monday, 29 June 2015, 20:32 GMT
Wouldn't enabling this enable Namespace Sandbox in Chromium?

Loading...