FS#62780 - [security] enable CONFIG_USERFAULTFD for the stock kernel

Attached to Project: Arch Linux
Opened by Oleg Finkelshteyn (olegfink) - Thursday, 30 May 2019, 08:19 GMT
Last edited by Jan Alexander Steffens (heftig) - Sunday, 09 August 2020, 02:07 GMT
Task Type Feature Request
Category Security
Status Closed
Assigned To Tobias Powalowski (tpowa)
Jan Alexander Steffens (heftig)
Levente Polyak (anthraxx)
Architecture All
Severity Low
Priority Normal
Reported Version
Due in Version Undecided
Due Date Undecided
Percent Complete 100%
Votes 4
Private No

Details

The userfaultfd mechanism⁰ has been around for a while and is enabled in rhel/centos¹ and others.

Is there a particular reason it isn't in archlinux? If not, I'd like to request enabling CONFIG_USERFAULTFD for the stock linux kernel.

Thanks

0 https://www.kernel.org/doc/Documentation/vm/userfaultfd.txt
1 https://git.centos.org/rpms/kernel/blob/c7/f/SOURCES/kernel-3.10.0-x86_64.config#_225
This task depends upon

Closed by  Jan Alexander Steffens (heftig)
Sunday, 09 August 2020, 02:07 GMT
Reason for closing:  Implemented
Additional comments about closing:  linux 5.8.arch1-2
Comment by Levente Polyak (anthraxx) - Friday, 21 June 2019, 18:06 GMT
Generally userfaultfd is not particularly useful in real world, except maybe some stuff you gain in kvm.
So question is to weight risk vs. gain. my recommendation (which i advocated in the past as well) is that
its just better to not enable it as the real gain is very limited.

The problem with userfaultfd is that its an attack primitive used in kernel exploits to leverage use after
free bugs by being able to temporarily halt the kernel in order to exploit it.
An example demonstration of this primitive is CVE-2016-6187 by a nice writup from https://cyseclabs.com/blog/cve-2016-6187-heap-off-by-one-exploit
plus CVE-2016-4557 and others (just to name some).

Besides being an attack primitive to exploit other bugs, it itself has proven to be the source of several
privilege escalation, information disclosure, access restriction bypass and denial of service issues.
To name a few: CVE-2019-11599 CVE-2018-18397 CVE-2017-15126

so to summarize it: I don't believe this is a _really_ useful thing that a standard kernel should have as the gain is limited
(besides aiding exploitation) :p In fact this is the very reason this won't ever make it into linux-hardened and I
recommend the same for general purpose kernels.
Comment by Oleg Finkelshteyn (olegfink) - Monday, 16 September 2019, 14:23 GMT
Hi Levente,

thank you for your input.

I'm not a kernel security expert, so I will avoid commenting on the specific past vulnerabilities you mention.
I will, however, address your overbearing points, specifically:

(i) uses of userfaultfd:
Even though the interface was originally designed to facilitate qemu/kvm post-copy live migration, it is in fact a generally better replacement for the conventional mprotect+sigsegv handler pattern, as seen in many database management systems.
In comparison to the former mechanism, userfaultfd allows for better serialisation/scheduling/parallelisation of fault processing (since the fault handling thread is no longer tied to the faulting thread), out-of-process memory management, including explicit memory sharing, and, perhaps most interestingly when used instead of mprotect+sigsegv, it does not lock mm_sem for writing at all, which would often be a bottleneck in heavily multithreaded applications. Here's a random writeup from the early days of getting some actual work done with userfaultfd: http://tech.adroll.com/blog/data/2016/11/29/traildb-mmap-s3.html

(ii) security of userfaultd:
I'd like to point out that all the garden variety server distributions (RHEL/CentOS, Debian, Ubuntu) have userfaultfd enabled by default in their default kernels in spite of the perceived risks.
Additionally, I note that since 5.2, there is a knob to explicitly disable non-superuser access to userfaultfd, /proc/sys/vm/unprivileged_userfaultfd, which should help alleviate your concerns: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=cefdca0a86be517bc390fc4541e3674b8e7803b0

In conclusion, my view is that userfaultfd is interesting and useful for sometimes even novel memory management patterns in userspace, with most of its security risks mitigated since 5.2 by the ability to restrict its use to superuser only.
Comment by Levente Polyak (anthraxx) - Monday, 16 September 2019, 17:15 GMT
Hi Oleg,

thank you for your input as well, my statement about it being totally useless without exceptions is a bit too broadly worded: I confess.
However, let me address your overbearing point relate to security risk mitigation.

Having others do something is not in any mean a technical opinion or reason why it should be like in Arch as well. Listing a bunch of distros adds no value in terms of technical reasoning. For example: Just because debian, RHEL, Ubuntu and others apply a huge amount of (partially custom) patches to their packages doesn't mean Arch Linux will do the same, we shall decide by technical reasons and not "because other do it as well".
Anyway, this doesn't matter much, what matters:

This patch in fact mitigates absolutely nothing, the only change this patch achieves is that people who know this exists can adjust it during runtime rather than being in the need to recompile the kernel themselves. The default behavior is still exactly the same as pre 5.2: if you have userfaultd enabled it works for the whole userspace in an unprivileged manner. Security is something that absolutely must come by default with potentially an option to opt-out of the secure default, otherwise it will achieve very little -- the vast majority or users will not investigate into thousands of knobs they may or may not adjust. It either is secure by default or it isn't, so by fact this patch did not mitigate anything at all but just provides convenience through a runtime knob.
Actually, this patch was originally intended the other way around, make this secure by _default_ and have a way to re-enable it if needed. This can still be grasped when carefully reading the descriptions and the bottom side, this was intended to be secure by default and have unprivileged userfaultd as an opt-in feature rather than opt-out, but was changed while going through the process of getting it into the tree for political reasons.

So, yes, you are right with this having some area where it can be useful but you are very wrong about it's risk being mitigated in any means. However, I will create a patch for hardened that adds something like CONFIG_USERFAULTFD_UNPRIVILEGED that can be used to specify the default behavior which I may try to upstream. In that case we could have CONFIG_USERFAULTFD enabled while not having CONFIG_USERFAULTFD_UNPRIVILEGED.
Comment by Oleg Finkelshteyn (olegfink) - Thursday, 31 October 2019, 04:55 GMT
Note that this bug report is about the mainstream kernel (package linux), not hardened.

If the consensus here is in line with anthraxx's last suggestion about having unprivileged userfaultfd disabled by default, how about adding

vm.unprivileged_userfaultfd = 0

to filesystem/sysctl (installed as /usr/lib/sysctl.d/10-arch.conf)? This will effectively make userfaultfd available only to privileged processes by default.
Comment by Oleksandr Natalenko (post-factum) - Thursday, 06 August 2020, 11:46 GMT
Gentle bump.

There's an unprivileged checkpoint/restore coming in [1], which is aimed for more widespread use of CRIU and lower JVM startup times.

Given that, CRIU also prefers having userfaultfd enabled to make the migration faster [2].

Thus, can we please enable userfaultfd in the Arch kernel?

Thanks.

[1] https://lore.kernel.org/lkml/20200804113202.72667-1-christian.brauner%40ubuntu.com/
[2] https://criu.org/Userfaultfd

Loading...