FS#66526 - [linux] failing partition scan on loop devices with -EBUSY

Attached to Project: Arch Linux
Opened by Frantisek Sumsal (mrc0mmand) - Sunday, 03 May 2020, 18:21 GMT
Last edited by Jan Alexander Steffens (heftig) - Thursday, 07 May 2020, 21:56 GMT
Task Type Bug Report
Category Packages: Core
Status Closed
Assigned To Tobias Powalowski (tpowa)
Jan Alexander Steffens (heftig)
Levente Polyak (anthraxx)
Architecture All
Severity Medium
Priority Normal
Reported Version
Due in Version Undecided
Due Date Undecided
Percent Complete 100%
Votes 0
Private No

Details

Description:
Since kernel 5.6.8 the handling of loop devices seems to be partially broken - after setting up a loop device kernel complain about failing to read its partition table and the loop device is loaded as unpartitioned, which is incorrect. Also happens on 5.6.10 from testing.

I did a few tests on 5.7.0 (provided by Fedora Rawhide ATM) and the handling seems to be back, even though kernel still complains (which isn't reassuring).

Additional info:
* package version(s)
linux 5.6.8.arch1-1 (core)
linux 5.6.10.arch1-1 (testing)

Steps to reproduce (linux 5.6.10.arch1-1):
# dd if=/dev/zero of=/disk.img bs=1M count=100
# losetup --show -P -f /disk.img
/dev/loop0
# printf ",10M\n,\n" | sfdisk /dev/loop0
...
Device Boot Start End Sectors Size Id Type
/dev/loop0p1 2048 22527 20480 10M 83 Linux
/dev/loop0p2 22528 204799 182272 89M 83 Linux

The partition table has been altered.
Calling ioctl() to re-read partition table.
Syncing disks.
# dmesg | tail
...
[ 606.892705] loop_reread_partitions: partition scan of loop1 (/disk.img) failed (rc=-16)
[ 611.236442] __loop_clr_fd: partition scan of loop0 failed (rc=-16)
[ 611.985511] loop_reread_partitions: partition scan of loop0 (/disk.img) failed (rc=-16)
[ 630.835052] loop0: p1 p2
# losetup -d /dev/loop0
# losetup --show -P -f /disk.img
/dev/loop0
# lsblk /dev/loop0
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
loop0 7:0 0 100M 0 loop
# dmesg | tail
...
[ 695.772141] loop_reread_partitions: partition scan of loop0 (/disk.img) failed (rc=-16)
# blockdev --rereadpt /dev/loop0
# lsblk /dev/loop0
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
loop0 7:0 0 100M 0 loop
├─loop0p1 259:0 0 10M 0 part
└─loop0p2 259:1 0 89M 0 part

This task depends upon

Closed by  Jan Alexander Steffens (heftig)
Thursday, 07 May 2020, 21:56 GMT
Reason for closing:  Fixed
Additional comments about closing:  5.6.11.arch1-1
Comment by Eicke Herbertz (WolleTD) - Monday, 04 May 2020, 14:28 GMT
I can confirm this. However, `blockdev --rereadpt /dev/loop0` fails with "blockdev: ioctl error on BLKRRPART: Invalid argument", on a GPT image at least.
Comment by Frantisek Sumsal (mrc0mmand) - Monday, 04 May 2020, 14:32 GMT
Ah, thanks, I forgot to mention that - the `blockdev` "fix" indeed fails intermittently with `blockdev: ioctl error on BLKRRPART: Device or resource busy` (at least it did when I tried it in our automation, manually it worked fine).
Comment by Eicke Herbertz (WolleTD) - Monday, 04 May 2020, 17:30 GMT
That's odd, because I really was thinking that even the partscan of losetup worked for me once or twice with 5.6.8. I specifically remember deleting the image file after the error and generating a new one and that worked.

Assuming this was fine in 5.6.7 (I jumped from 5.6.4 to 5.6.8), this is a more-than-suspicious change, as it directly changes the condition on which -EBUSY is returned:
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=linux-5.6.y&id=6b97491e6ca7c1352aa6ffb1e4a18fc90d20e59d
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=linux-5.6.y&id=8ecb61c6fbb623339c28bb9d5427568c1254f4bb

loop_reread_partitions() calls bdev_disk_changed() which calls blk_drop_partitions(). While I don't know the details (yet), I guess loop devices can (and will, apparently) indeed have multiple "openers".
That would also explain why it's working sometimes.

There's a fix in master already though: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/block/partitions/core.c?id=10c70d95c0f2f9a6f52d0e33243d2877370cef51
Comment by Eicke Herbertz (WolleTD) - Wednesday, 06 May 2020, 19:46 GMT Comment by Frantisek Sumsal (mrc0mmand) - Thursday, 07 May 2020, 21:38 GMT
After giving 5.6.11.arch1-1 a spin (~several dozens or so attempts) it indeed looks like the issue is gone. Thanks!

Loading...