Arch Linux

Please read this before reporting a bug:
https://wiki.archlinux.org/index.php/Reporting_Bug_Guidelines

Do NOT report bugs when a package is just outdated, or it is in Unsupported. Use the 'flag out of date' link on the package page, or the Mailing List.

REPEAT: Do NOT report bugs for outdated packages!
Tasklist

FS#58355 - [linux][linux-lts] crng init really slow

Attached to Project: Arch Linux
Opened by qwerty (macrocdd) - Wednesday, 25 April 2018, 22:56 GMT
Last edited by Jan Alexander Steffens (heftig) - Sunday, 28 October 2018, 11:29 GMT
Task Type Bug Report
Category Packages: Core
Status Assigned
Assigned To Tobias Powalowski (tpowa)
Andreas Radke (AndyRTR)
Jan Alexander Steffens (heftig)
Levente Polyak (anthraxx)
Architecture x86_64
Severity Medium
Priority Normal
Reported Version
Due in Version Undecided
Due Date Undecided
Percent Complete 0%
Votes 6
Private No

Details

Description:
After updating linux-lts 4.14.35-1 -> 4.14.36-1, the kernel: random: crng init done process takes 10 seconds to load from ssd. IntelCore i3.

Additional info:
* 4.14.36-1
* # journalctl -b


Steps to reproduce:
апр 26 01:35:06 archlabs ntpd[383]: Listen normally on 4 wlan0 192.168.1.9:123
апр 26 01:35:06 archlabs ntpd[383]: Listen normally on 5 wlan0 [fe80::7867:3b18:41>
апр 26 01:35:06 archlabs ntpd[383]: new interface(s) found: waking up resolver
апр 26 01:35:17 archlabs kernel: random: crng init done
апр 26 01:35:17 archlabs systemd[452]: Started D-Bus User Message Bus.
апр 26 01:35:19 archlabs systemd[452]: Starting Sound Service...
This task depends upon

Comment by qwerty (macrocdd) - Thursday, 26 April 2018, 00:03 GMT
The kernel rollback showed the absence of the random string: fast init done at the beginning of the download
Comment by loqs (loqs) - Thursday, 26 April 2018, 00:52 GMT Comment by loqs (loqs) - Sunday, 29 April 2018, 15:24 GMT
If the system has a TPM is the issue still present in 4.14.37?
Comment by Dimos Dimoulis (dimosd) - Monday, 30 April 2018, 13:31 GMT
I also have this problem, crng init is delayed 45 secs. Still affects 4.14.38.
Comment by loqs (loqs) - Monday, 30 April 2018, 14:21 GMT
Please bisect between 4.14.35 and 4.14.36 and report the bad commit upstream.
Comment by qwerty (macrocdd) - Tuesday, 01 May 2018, 09:08 GMT
4.16.5 kernel also have this bug

Comment by qwerty (macrocdd) - Thursday, 03 May 2018, 09:30 GMT
The problem is not very well resolved:
# pacman -S haveged
# systemctl enable haveged
Comment by Dimos Dimoulis (dimosd) - Thursday, 03 May 2018, 09:46 GMT
I had noticed that pressing a few keys to provide some entropy, quickly initialized the random generator.
There are also several messages such as this:
random: systemd: uninitialized urandom read (16 bytes read)
Could it be that the recent kernel changes broke systemd? And why does this only affect a few people?
Comment by loqs (loqs) - Thursday, 03 May 2018, 11:21 GMT
Once you have located which commit is the cause you could discuss that commit upstream with the kernel and systemd developers.
Comment by loqs (loqs) - Saturday, 05 May 2018, 21:06 GMT Comment by Andreas Radke (AndyRTR) - Sunday, 06 May 2018, 07:41 GMT
Only few people seem to be affected. Better use custom builds or stay with older pkg versions until upstream solution is available.
Comment by Antonio Tessarolo (anthonytex) - Saturday, 09 June 2018, 13:51 GMT
Same here with 4.16.12-1-ARCH
Comment by Christian Galander (twoCore) - Saturday, 23 June 2018, 07:45 GMT
It looks like, that only systems with no TPM are affected:

- Fujitsu PC, Intel Core i3-4150, no TPM built-in ( delayed up to 30s - Kernel 4.17.2 )
- Acer Swift 3, Intel Core i5-8250U, TPM built-in ( no delay during boot - Kernel 4.17.2 )

Regards from Germany
Comment by Dimos Dimoulis (dimosd) - Saturday, 23 June 2018, 15:31 GMT
For me and without TPM, only linux-lts is affected. linux-4.17.2 is not affected. Also, a custom 4.14 build with a configuration based on linux-stable, is not affected.
Reverting the patch as suggested, gave me stability problems with suspend/resume.
Comment by Tobias Powalowski (tpowa) - Sunday, 24 June 2018, 19:12 GMT
I'm also affected on 4.17.2.
Comment by tleo (tleo) - Wednesday, 04 July 2018, 07:48 GMT
I'm also affected on two of my machines (on 4.14 and 4.17), none of them has tpm.
Comment by loqs (loqs) - Sunday, 29 July 2018, 21:02 GMT Comment by loqs (loqs) - Tuesday, 31 July 2018, 08:09 GMT Comment by Jan (medhefgo) - Tuesday, 31 July 2018, 20:09 GMT
FYI, that doesn't fix the issue at all. It only mixes in rdrand entropy to any entropy provided by userspace. Which is funny when you use rng-tools to work around this: mixing rdrand entropy into rdrand entropy. Why not just give us a kernel command line option to tell the kernel to trust rdrand? Intel has much better options available to fuck us over rather than surreptitiously tamper their hardware rng.
Comment by Jan (medhefgo) - Monday, 08 October 2018, 15:16 GMT Comment by loqs (loqs) - Friday, 26 October 2018, 19:59 GMT
Can those affected test linux 4.19.arch1-1 which has CONFIG_RANDOM_TRUST_CPU=y
Comment by Jan (medhefgo) - Friday, 26 October 2018, 20:16 GMT
4.19.arch1-1 works for me. Though, should this really be enabled by default considering a lot of people distrust their CPU vendors?
Comment by Jan Alexander Steffens (heftig) - Sunday, 28 October 2018, 09:33 GMT
I'll probably revert the config change so you will have to boot with the parameter.
Comment by Dimos Dimoulis (dimosd) - Sunday, 28 October 2018, 09:46 GMT
I haven't yet tried 4.19, however if you do revert the option please keep it as a boot time parameter. Without it, the system appears to hang until I press a few keys and I think this is unacceptable as default behaviour. Since the kernel has several sources of entropy and the lack of trust for the CPU is more of a problem in virtual machines and such, maybe you should consider keeping CONFIG_RANDOM_TRUST_CPU=y as default and letting people change it if they so wish.
Comment by Jensen McKenzie (your_doomsday) - Sunday, 28 October 2018, 18:36 GMT
"please keep it as a boot time parameter."

This isn't a distro choice. Boot parameter will always exist.

"maybe you should consider keeping CONFIG_RANDOM_TRUST_CPU=y as default and letting people change it if they so wish."

The problem with this is that people who may want to switch it off, won't be aware that such thing exist in the first place as there won't be any visual changes in their system unless you look under the hood. On the other hand in case CONFIG_RANDOM_TRUST_CPU=N people who may want to switch it on, will have to be aware of it otherwise they won't boot their system. CONFIG_RANDOM_TRUST_CPU=N is also the same behaviour as before Linux 4.19 so having to switch something on or use other tools like haveged won't be a regression.
Comment by Dimos Dimoulis (dimosd) - Monday, 29 October 2018, 09:46 GMT
The behaviour introduced in 4.18 was causing problems for some people. I think it was mentioned that Fedora went as far as reversing the patch. We'll have to wait and see how other distros are handling this in 4.19, but I am guessing that whey will default in =Y, because it will cause fewer problem reports for them.
Also, not trusting the CPU and its RNG really only affects the first seconds of booting: after that, network activity, keyboard etc. mix in more entropy. This option was introduced for certain low entropy situations, such as virtual machines. If someone wants to be extra cautious and it doesn't cause problems, then they can enable it (and it would be advertised in wiki/Security), but imho it's not a "must have" feature for everyone.
Comment by Jensen McKenzie (your_doomsday) - Monday, 29 October 2018, 10:58 GMT
"Also, not trusting the CPU and its RNG really only affects the first seconds of booting"

That's right but that's the actual concern. CPU RNG was always added to entropy mix AFTER boot and that wasn't controversial at all. CONFIG_RANDOM_TRUST_CPU=Y allows to use CPU RNG as seed for entropy in early boot which theoretically can affect further entropy mix in deterministic way.

"If someone wants to be extra cautious and it doesn't cause problems, then they can enable it"

Reading from context I think you meant "disable" not "enable".

"it's not a "must have" feature for everyone"

I know you mean the opposite but this perfectly fits for: CONFIG_RANDOM_TRUST_CPU=y it's not a "must have" feature for everyone. I bet it's needed for 1% usecases.
Comment by Dimos Dimoulis (dimosd) - Friday, 09 November 2018, 09:53 GMT
https://lkml.org/lkml/2018/7/17/1279

A discussion about the original patch, its intention, pros and cons.
Comment by Dimos Dimoulis (dimosd) - Saturday, 12 January 2019, 07:25 GMT
With 4.19 kernels, CONFIG_RANDOM_TRUST_CPU=n doesn't cause too much a delay any more.
[ 4.206676] random: crng init done
Even if it did, the solution is to use random.trust_cpu=on
Comment by loqs (loqs) - Saturday, 12 January 2019, 19:30 GMT
@dimosd see  FS#61233  for a recent case of the delay being in the tens of seconds range and random.trust_cpu=on having no effect due to the absence of RDRAND and RDSEED haveged did resolve the issue.
Systemd 420 will prefer using the RDRAND processor instruction over /dev/urandom whenever it requires randomness that neither has to be crypto-grade nor should be reproducible.
Possibly close as Not a bug / Won't Fix as this is what upstream has chosen to do?
Comment by Dimos Dimoulis (dimosd) - Sunday, 13 January 2019, 10:19 GMT
With systemd 240, linux 4.19.14, cpu supports RDRAND: I get delays of 4-24 secs and no delay if I use random.trust_cpu=on
If the cpu didn't support RDRAND then I would have to use haveged.
These are the two known workarounds and I don't think there's anything more we can do for now.

https://www.phoronix.com/scan.php?page=news_item&px=Systemd-RdRand-Direct also mentions a "high_quality_required" systemd option.

Loading...