Arch Linux

Please read this before reporting a bug:

Do NOT report bugs when a package is just outdated, or it is in the AUR. Use the 'flag out of date' link on the package page, or the Mailing List.

REPEAT: Do NOT report bugs for outdated packages!

FS#75240 - [btrfs-progs] Wrong precondition in btrfs-scrub@.service silently disables scrubbing on RAID5/6

Attached to Project: Arch Linux
Opened by Andrej Podzimek (andrej) - Monday, 04 July 2022, 17:35 GMT
Last edited by Toolybird (Toolybird) - Tuesday, 02 August 2022, 08:46 GMT
Task Type Bug Report
Category Packages: Core
Status Assigned
Assigned To Tobias Powalowski (tpowa)
Sébastien Luttringer (seblu)
Architecture All
Severity High
Priority Normal
Reported Version
Due in Version Undecided
Due Date Undecided
Percent Complete 0%
Votes 1
Private No



Here’s a famous set of recommendations for Btrfs parity raid:

One of those^^^ recommendations says: “- run scrubs on one disk at a time.”

For my Btrfs RAID array, this^^^ yields a setup like:

systemctl enable --now btrfs-scrub@dev-mapper-crypt0.timer
systemctl enable --now btrfs-scrub@dev-mapper-crypt1.timer
systemctl enable --now btrfs-scrub@dev-mapper-crypt7.timer

In most cases RandomizedDelaySec=1w prevents overlaps from happening. (And if overlaps do occur occasionally, they cause just a performance penalty, no big deal at all.)

So the setup (using *device paths* instead of *mount points*) used to work fine for me. Unfortunately, at some point a systemd unit preconditions threw a wrench into it without any obvious warning.

The problem is in this file: /usr/lib/systemd/system/btrfs-scrub@.service

These two lines in particular:

The scrub service (started by the scrub timer) then silently bails out with this error message:

Btrfs scrub on /dev/mapper/crypt0 was skipped because of a failed condition check (ConditionPathIsMountPoint=/dev/mapper/crypt0).

Unfortunately, this^^^ problem is far from obvious, because there is *nothing* in `systemctl --failed`. In `systemctl list-timers` it looks as if everything was working fine. (Admittedly, I should have checked the last date in `btrfs scrub status /mount/point`.) It is not a good state to be in when one assumes that everything to be regularly scrubbed, but it is not done.

A quick workaround is:

# mkdir /etc/systemd/system/btrfs-scrub@.service.d
# cat > /etc/systemd/system/btrfs-scrub@.service.d/reallyscrubmyfs.conf <<- EOF
# systemctl daemon-reload

This^^^ drop-in file will disable the preconditions.

Additional info:
* package version(s) btrfs-progs: 5.18.1-1
* config and/or log files etc.: N/A (Just the error message mentioned above.)
* link to upstream bug report, if any: When I call paru -G btrfs-progs, the unit file is fetched, so I think it’s an ArchLinux-specific issue.

Steps to reproduce:

Set up a Btrfs scrub timer by device path in /dev, e.g. /dev/mapper/myluks (which is a valid argument to `btrfs scrub start ...`) instead of by mount point.
The service started by the timer will be failing silently.

Some ideas (not necessarily feasible):

(a) The failures should be loud (i.e. should cause the service to fail and appear in systemctl --failed).

(b) The unit file should be split into btrfs-scrub@.service and (e.g.) btrfs-scrub-by-device@.service. The latter would have different (or no) preconditions. (And it could also be nicely configured to avoid overlaps, if systemd allows that. (Global overlap avoidance would be more than good enough; it doesn’t have to be perfect == per-filesystem.))

(c) The preconditions should be removed or adjusted so that device files are an option.
This task depends upon

Comment by Andrej Podzimek (andrej) - Monday, 04 July 2022, 17:38 GMT
Oh and I think there’s a bug in Flyspray. :-) It clobbered the link above. Here’s the link without https:// and in ``, so that it hopefully doesn’t get linkified and clobbered: ``