FS#54769 - [mkinitcpio] [WITH PATCH] poll_device() not always called when udev hook is not used

Attached to Project: Arch Linux
Opened by Tom Yan (tom.ty89) - Tuesday, 11 July 2017, 17:54 GMT
Last edited by Dave Reisner (falconindy) - Friday, 14 July 2017, 15:16 GMT
Task Type Bug Report
Category Arch Projects
Status Closed
Assigned To Dave Reisner (falconindy)
Architecture All
Severity Low
Priority Normal
Reported Version
Due in Version Undecided
Due Date Undecided
Percent Complete 100%
Votes 0
Private No

Details

Description:

poll_device() is only called in resolve_device() of init_functions:

...
case $device in
# path to kernel named block device
/dev/*)
if poll_device "$device" "$rootdelay"; then
echo "$device"
return 0
fi
;;
...

when $device is /dev/*. When $device is UUID=*|LABEL=*|PARTUUID=*|PARTLABEL=*:

...
case $device in
UUID=*|LABEL=*|PARTUUID=*|PARTLABEL=*)
dev=$(blkid -lt "$device" -o device)
if [ -z "$dev" -a "$udevd_running" -eq 1 ]; then
dev=$(tag_to_udev_path "${device%%=*}" "${device#*=}")
fi
esac
...

if the udev hook is used, $device would get "translated" to /dev/disk/by-id/* (which is a form of /dev/*) and hence got polled; but if the udev hook is not used, $device in the form of UUID=*|LABEL=*|PARTUUID=*|PARTLABEL=* will not be polled.

This is obviously an inconsistency in behaviour, and polling a device neither requires udev nor has anything to do with it anyway.

Another trivial inconsistency is, a device can sometimes be fast enough to be found by blkid so $dev would be set to something like /dev/sdX, while in some occational slow boots $dev would be set to /dev/disk/by-*/*. This could produce boot message variation that is confusing to some users.

Attaching a patch that eliminates all the udev non-sense in init_functions and fixes the aforementioned problems.

Additional info:
* package version(s): v23
This task depends upon

Closed by  Dave Reisner (falconindy)
Friday, 14 July 2017, 15:16 GMT
Reason for closing:  None
Additional comments about closing:  I don't get paid enough for this.
Comment by Eli Schwartz (eschwartz) - Tuesday, 11 July 2017, 18:04 GMT
You would be best off posting this on the arch-projects mailing list, which is where mkinitcpio development happens: https://lists.archlinux.org/listinfo/arch-projects
Comment by Dave Reisner (falconindy) - Tuesday, 11 July 2017, 18:10 GMT
> This is obviously an inconsistency in behaviour, and polling a device neither requires udev nor has anything to do with it anyway.
Why would the device show up at some point in the future if udev isn't running?
Comment by Tom Yan (tom.ty89) - Wednesday, 12 July 2017, 14:02 GMT
@falconidy,

Some devices/drivers are slow. Say USB/UAS drives. It might need as much as 5s or so for them to show up after their driver module is loaded (via the MODULES array). This has always been the actual reason why we need to poll device, not udev.

Note that the fact that USB drives being external does not mean that every user would needs the installation on them to be portable, not to mention that their drivers are being really general these days.

Also there's no reason to prevent the init / base hook to work as expected without another hook. Udev has simply never been actually relevant in this.

@eschwartz, yeah I planned to send this to the mailing list as well but really this is more of a bug than something that needed to be discussed or so, since I am not trying to make a "change", if you know what I mean

But if you guys think the patch needs more review or so for it being a bit drastic, I can put it up on the mailing list.
Comment by Dave Reisner (falconindy) - Wednesday, 12 July 2017, 14:19 GMT
Please just use udev and/or systemd.
Comment by Tom Yan (tom.ty89) - Thursday, 13 July 2017, 17:23 GMT
> Please just use udev and/or systemd.

That is not even relevant to this tbh. It's just not using udev reveals that fact the functions are written with wrong/idea assumption. IMHO one can even realize it simply by reading the current code.

> case $device in /dev/*) -> poll_device "$device" "$rootdelay"

poll_device() is only called here in resolve_device(), but then why only /dev/*? That is the problem.

And why it works with udev? Because we do tag_to_udev_path() for UUID=*|LABEL=*|PARTUUID=*|PARTLABEL=*, a.k.a "lazy resolution"; but then why is it necessary to be "lazy"? No idea. Or rather, it has never been necessary. (There is even a TODO for tag_to_udev_path(). No idea if it's still valid or if it's just leftover though. The function simply never has a point anyway, tbh.)

And these are all addressed in my patch:
1. Poll it no matter if it is a tag or a dev node, because the init support them, there's no reason we should differentiate them when it comes to polling or not.
2. No more tag_to_udev_path silliness.
3. Either test (i.e. see if it shows up via dev node check) or resolve (i.e. see if it shows up via blkid) as necessary when we poll.

Additionally:
4. When rootdelay=0, we still test/resolve it, just not wait for it.

One more goodie:
5. now "Waiting $seconds seconds for device $device ..." will be consistent. $device will not vary among /dev/* and /dev/disk/by-*/* in different boots (yes this happens even if it one uses udev), instead it will always be the value of root= that the user has specified.
Comment by Dave Reisner (falconindy) - Thursday, 13 July 2017, 18:36 GMT
I'm not convinced it's the code that has the wrong assumptions...

poll_device only makes sense on actual files (devices) that might eventually show up. You can't wait on UUID=12345... or LABEL=chickens. That's why they're translated to udev paths -- because those *will* (or could) eventually show up.

> but then why is it necessary to be "lazy". No idea. Or rather, it has never been necessary.
The comment in the code explains it... If you don't know why it's there, it's highly questionable that you then claim that it can be removed entirely. It's additionally odd that you think repeatedly calling blkid is better than waiting on udev.

> There is even a TODO for tag_to_udev_path()
It's entirely irrelevant for this discussion. I don't know why you even bring it up.

My suggestions for improvements here, in order of preference (highest first):
1) Drop the shell-based init entirely
2) Require udev in the initramfs
...
N) Accept your patch
Comment by Tom Yan (tom.ty89) - Friday, 14 July 2017, 14:59 GMT
> You can't wait on UUID=12345... or LABEL=chickens.

Yes you can. You poll them with blkid, just like you poll /dev/* (udev symlink or not) with [ -b.

> It's additionally odd that you think repeatedly calling blkid is better than waiting on udev.

Have you even ever tried booting a SATA and USB drive _respectively_ _without the udev hook_ but only the necessary two or three drivers in the MODULES array?

With SATA drives it always works, with USB (even 3.1/XHCI/UAS) drives it never works. It's the slow device/driver that we need to wait, not some insignificant overhead brought by the udev hook, if that's what you mean.

(I seriously wonder that you have some really wrong assumption for udev, coz you seem to have no idea why the udev hook is not enough for multi-device btrfs either, while the reason is so obvious)

> The comment in the code explains it... If you don't know why it's there, it's highly questionable that you then claim that it can be removed entirely.

> It's entirely irrelevant for this discussion. I don't know why you even bring it up.

Don't you find yourself being a bit self-contradictory?

The logic on how it handles the tags has never been sensical. If the tag_to_udev_path is fine, why should it even attempt _once_ (that could just highly fail) to resolve it with blkid? While blkid gives us reliable result, why should it "fallback" to tag_to_udev_path, which doesn't even handle a LABEL with a space (but somehow a slash)? Why doesn't it poll tags with blkid, while whoever hates the idea can just prepend /dev/disk/by-uuid/ instead of UUID= when he configures the bootloader, instead of relying on a function in an awful state?

> 1) Drop the shell-based init entirely

Yeah that's a great idea. You leave us with no fallback choice just because someone sent you a decent patch. Don't forget archiso btw.

> 2) Require udev in the initramfs

The only role udev play in the issue addressed here is that it creates those convenient symlinks. And the only reason those symlinks are relevant here is because somehow you (or whoever wrote that) think that relying (yet blkid is still attempted -> inconsistent message) on an awful function so that you can avoid polling with blkid is a better idea. Wonderful.

> N) Accept your patch

I don't have high hope on getting this patch accepted since the beginning. I just couldn't bear that we have that cute little init yet the functions are defected. Maybe you will find it beautifully written in a year or two, just like you end up fixing two of my gripes that don't even concern me much anymore.

Loading...