FS#58355 - [linux][linux-lts] crng init really slow
Attached to Project:
Arch Linux
Opened by qwerty (macrocdd) - Wednesday, 25 April 2018, 22:56 GMT
Last edited by Andreas Radke (AndyRTR) - Tuesday, 17 March 2020, 09:53 GMT
Opened by qwerty (macrocdd) - Wednesday, 25 April 2018, 22:56 GMT
Last edited by Andreas Radke (AndyRTR) - Tuesday, 17 March 2020, 09:53 GMT
|
Details
Description:
After updating linux-lts 4.14.35-1 -> 4.14.36-1, the kernel: random: crng init done process takes 10 seconds to load from ssd. IntelCore i3. Additional info: * 4.14.36-1 * # journalctl -b Steps to reproduce: апр 26 01:35:06 archlabs ntpd[383]: Listen normally on 4 wlan0 192.168.1.9:123 апр 26 01:35:06 archlabs ntpd[383]: Listen normally on 5 wlan0 [fe80::7867:3b18:41> апр 26 01:35:06 archlabs ntpd[383]: new interface(s) found: waking up resolver апр 26 01:35:17 archlabs kernel: random: crng init done апр 26 01:35:17 archlabs systemd[452]: Started D-Bus User Message Bus. апр 26 01:35:19 archlabs systemd[452]: Starting Sound Service... |
This task depends upon
Closed by Andreas Radke (AndyRTR)
Tuesday, 17 March 2020, 09:53 GMT
Reason for closing: Won't fix
Additional comments about closing: if boot is stalling, try adding this: random.trust_cpu=1
Tuesday, 17 March 2020, 09:53 GMT
Reason for closing: Won't fix
Additional comments about closing: if boot is stalling, try adding this: random.trust_cpu=1
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git/commit/?h=v4.14.36&id=90936d903c2f34663cffe68d9845debdeb85174c
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git/commit/?h=v4.14.36&id=d152fcc173149a99d6f707a5b8a80d83d906755b
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git/commit/?h=v4.14.36&id=7b6b1f3a192372937164d1293b432c640ffc7c8f
# pacman -S haveged
# systemctl enable haveged
There are also several messages such as this:
random: systemd: uninitialized urandom read (16 bytes read)
Could it be that the recent kernel changes broke systemd? And why does this only affect a few people?
for linux https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git/commit/?h=cd8d7a5778a4abf76ee8fe8f1bfcf78976029f8d
for linux-lts https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git/commit/?h=6e513bc20ca63f594632eca4e1968791240b8f18
as fedora has done while waiting on upstream to provide a permanent solution?
https://lkml.org/lkml/2018/4/26/103
- Fujitsu PC, Intel Core i3-4150, no TPM built-in ( delayed up to 30s - Kernel 4.17.2 )
- Acer Swift 3, Intel Core i5-8250U, TPM built-in ( no delay during boot - Kernel 4.17.2 )
Regards from Germany
Reverting the patch as suggested, gave me stability problems with suspend/resume.
Queued for 4.14.60 https://git.kernel.org/pub/scm/linux/kernel/git/stable/stable-queue.git/tree/queue-4.14/random-mix-rdrand-with-entropy-sent-in-from-userspace.patch?id=b77986f577d6c5bcea7d5d075b626c0752e03ec1
This isn't a distro choice. Boot parameter will always exist.
"maybe you should consider keeping CONFIG_RANDOM_TRUST_CPU=y as default and letting people change it if they so wish."
The problem with this is that people who may want to switch it off, won't be aware that such thing exist in the first place as there won't be any visual changes in their system unless you look under the hood. On the other hand in case CONFIG_RANDOM_TRUST_CPU=N people who may want to switch it on, will have to be aware of it otherwise they won't boot their system. CONFIG_RANDOM_TRUST_CPU=N is also the same behaviour as before Linux 4.19 so having to switch something on or use other tools like haveged won't be a regression.
Also, not trusting the CPU and its RNG really only affects the first seconds of booting: after that, network activity, keyboard etc. mix in more entropy. This option was introduced for certain low entropy situations, such as virtual machines. If someone wants to be extra cautious and it doesn't cause problems, then they can enable it (and it would be advertised in wiki/Security), but imho it's not a "must have" feature for everyone.
That's right but that's the actual concern. CPU RNG was always added to entropy mix AFTER boot and that wasn't controversial at all. CONFIG_RANDOM_TRUST_CPU=Y allows to use CPU RNG as seed for entropy in early boot which theoretically can affect further entropy mix in deterministic way.
"If someone wants to be extra cautious and it doesn't cause problems, then they can enable it"
Reading from context I think you meant "disable" not "enable".
"it's not a "must have" feature for everyone"
I know you mean the opposite but this perfectly fits for: CONFIG_RANDOM_TRUST_CPU=y it's not a "must have" feature for everyone. I bet it's needed for 1% usecases.
A discussion about the original patch, its intention, pros and cons.
[ 4.206676] random: crng init done
Even if it did, the solution is to use random.trust_cpu=on
FS#61233for a recent case of the delay being in the tens of seconds range and random.trust_cpu=on having no effect due to the absence of RDRAND and RDSEED haveged did resolve the issue.Systemd 420 will prefer using the RDRAND processor instruction over /dev/urandom whenever it requires randomness that neither has to be crypto-grade nor should be reproducible.
Possibly close as Not a bug / Won't Fix as this is what upstream has chosen to do?
If the cpu didn't support RDRAND then I would have to use haveged.
These are the two known workarounds and I don't think there's anything more we can do for now.
https://www.phoronix.com/scan.php?page=news_item&px=Systemd-RdRand-Direct also mentions a "high_quality_required" systemd option.
I guess this is related to this bug here and can be resolved updating haveged and fixing its service file.
https://github.com/systemd/systemd/issues/13252 and
@eworm - can you please have a look?
DefaultDependencies=no
Before=sysinit.target shutdown.target
(Technically also Conflicts=shutdown.target but I'm not sure shutting down the entropy gatherer while systemd still needs bits is a good idea)
It is still reported as an issue on the forums for new installs with GDM without rotational media or requiring keyboard input before GDM starts.
Edit:
https://lore.kernel.org/lkml/alpine.DEB.2.21.1909290010500.2636%40nanos.tec.linutronix.de/
- haveged is not necessarily a good alternative to RDRAND because it has the same problem: it cannot be verified
- The following distributions have now enabled CONFIG_RANDOM_TRUST_CPU: Debian, Ubuntu, Fedora, Alpine. In particular Ubuntu and Alpine had initially disabled it but later reverted it because of bug reports.
Any decision on the default would be a compromise, CONFIG_RANDOM_TRUST=y deals with the known while =n deals with the unknown.
Back to technical reasoning:
Right now its an opt-in, as it should be. People who really need to trust the cpu on RNG because they are affected can opt-in to do so, while not exposing the majority of people who are not affected. The only thing that really is unknown is random.trust_cpu itself, i fail to see where all the blind trust comes from. Hardware vendors mostly choose performance over security to compete on the market. With all the lately discovers of spectre/meltdown/L1TF/MDS/TAA/iTLB i fail to see why we want to blindly trust on rng on a global scale instead of opt-in who _really_ needs it. The CPU rngs are closed spec knowledge, not audit-able in the classical sense, non-blocking infinite source of numbers and are purely considered safe from the vendors themselves because "its surely too complicated for anyone to ever understand what influences it so it must be safe".
The current setting of disabling it by default is the only sane option to do for a user base that is considered to be technically competent to enable it themselves if they really are affected and need to plus decided themselves its fine for them to trust it. It could surely be documented better in the wiki (contributions welcome) but the setting shall not be changed.
>From a technical reasoning standpoint its totally not important what other distros do
I very much disagree here. Arch Linux maintainers are highly capable, but so are the maintainers of other respected distributions. There have been discussions elsewhere on the subject and the current consensus is that disabling CONFIG_RANDOM_TRUST_CPU causes harm, most often in low entropy situations such as virtual machines but in real hardware as well.
For me the sane option would be to ship bug-free. An extra security conscious user should disable RDRAND (if possible) and also disable hyper-threading for instance, but we don't do it by default because it causes a large performance penalty. There has been no demonstration of exploiting RDRAND to my knowledge, unlike HT.
Users have the flexibility to turn it on, like you did. no insecure default needed no matter what you claim. Again: Why on earth should something be considered a good source of entropy for early boot (which fundamentally is implicitly how secure KASLR will be) if its purely based on closed-spec and by its creators only deemed secure because "its surely too complicated for anyone to ever understand what influences it so it must be safe".
The good thing about our user base is that we can expect competence, we neither enable, start or restart systemd units and we expect users to configure their systems how they like it. we are neither debian nor ubuntu. You are mixing my statement, its not about the package maintainers of the distros its about the expectations related to the user base, which frankly is fundamentally different to distros like ubuntu and debian.
> I very much disagree here. Arch Linux maintainers are highly capable, but so are the maintainers of other respected distributions. There have been discussions elsewhere on the subject and the current consensus is that disabling CONFIG_RANDOM_TRUST_CPU causes harm, most often in low entropy situations such as virtual machines but in real hardware as well.
Then do not defend your stance by saying "but some other distros did it". Defend your stance by saying "the well-respected maintainers of X distro had the following observation to make on the pros/cons of it, and I agree with their analysis".
Levente is saying, let's focus on merit-based arguments rather than simply blindly trusting another distro's judgment calls. Based purely on this argument, I have no idea why those distros made the decision they did.
Given one of the example distros is Ubuntu, it is plausible to me that the rationale was "users know nothing about computers and most of them don't have anything secure, but the distro comes preinstalled with gdm so we should optimize for this use case".
This fails to apply to archlinux for a whole bunch of reasons, including the fact that gdm is not preinstalled and the archlinux user base is explicitly targeted at people who tend to have biased ideas like "gnome is evil and DEs in general sort of suck, let me use this niche tiling WM that really enhances my personal use". I somehow doubt the i3 and sway users use gdm!
The logical conclusion here is that *iff* the tradeoff by Ubuntu and others was "our gdm users are sufficiently problematic that we're willing to make security sacrifices on their behalf", we should be definitively doing the opposite.
So: what technical arguments did these other distros use? Maybe something from them, applies to us as well.
Some use cases that may trigger the bug:
- Encrypted swap (read from /dev/urandom, block booting early)
- Losing connection to a remote server after reboot for some time (booting blocks before the network is initialized)
- Initializing encrypted LVM on a low entropy system leads to reduced security
Some comments:
https://lists.debian.org/debian-devel/2018/12/msg00204.html (the whole thread brings several pros-and-cons)
https://gitlab.alpinelinux.org/alpine/aports/issues/9960
The above are mostly server oriented and thus security minded distros with a technical user base.
As far as my opinion goes, I would enable RANDOM_TRUST_CPU. Stalling boots are quite painful (especially if it's not obvious why) and I think we're better served with the smoother experience than satisfying our paranoia about attacks on the RNG that haven't been demonstrated (especially if it would have to be pre-boot or early-boot).
But I also agree that our users should be competent enough to discover "if boot is stalling, try random.trust_cpu=1".
What about enabling it after the active entropy generation lands in our kernel? Theodore Ts'o (who added the above config) thinks the HWRNG is trustworthier than the jitter entropy and I'm inclined to agree with him.
https://github.com/systemd/systemd/commit/b62bc66018fa1ada09554e7ee46abbbfc8e6b3ad
And yet another set of kernel workaround patches to handle this borked hardware/firmware combination: https://lore.kernel.org/patchwork/patch/1115413/.
And yes, this is a CVE worthy hardware issue with the RNG, so please stop dragging security discussions always down to "paranoia", you can leave that part out and still be technically reasonable in your arguments
PS: Read the kernel docs, the kernel itself neither blindly trusts nor blindly mistrusts it, its a matter of downstream choice
PPS: an alternative is using the TPMs RNG in case you have a TPM, if someone decides to trust that more than the CPU, but that requires defining the trust in terms of rng_core.default_quality
PPPS: I wouldn't call a bug known since 2014 fairly recent: https://bugzilla.kernel.org/show_bug.cgi?id=85911
Maybe someone is willing to write some note to our wiki so it can be easily found how to solve.
Best place should be either https://wiki.archlinux.org/index.php/Arch_boot_process or https://wiki.archlinux.org/index.php/Random_number_generation
maybe with links to each other.
Then we should close this issue.