FS#57496 - [linux] 4.15.2-2 bfq: does not handle requeue

Attached to Project: Arch Linux
Opened by loqs (loqs) - Tuesday, 13 February 2018, 20:52 GMT
Last edited by Jan Alexander Steffens (heftig) - Wednesday, 14 February 2018, 01:52 GMT
Task Type Bug Report
Category Kernel
Status Closed
Assigned To Tobias Powalowski (tpowa)
Jan Alexander Steffens (heftig)
Architecture All
Severity Medium
Priority Normal
Reported Version
Due in Version Undecided
Due Date Undecided
Percent Complete 100%
Votes 0
Private No

Details

Description:
From http://lkml.org/lkml/2018/2/7/530
Commit 'a6a252e64914 ("blk-mq-sched: decide how to handle flush rq via
RQF_FLUSH_SEQ")' makes all non-flush re-prepared requests for a device
be re-inserted into the active I/O scheduler for that device. As a
consequence, I/O schedulers may get the same request inserted again,
even several times, without a finish_request invoked on that request
before each re-insertion.

This fact is the cause of the failure reported in [1]. For an I/O
scheduler, every re-insertion of the same re-prepared request is
equivalent to the insertion of a new request. For schedulers like
mq-deadline or kyber, this fact causes no harm. In contrast, it
confuses a stateful scheduler like BFQ, which keeps state for an I/O
request, until the finish_request hook is invoked on the request. In
particular, BFQ may get stuck, waiting forever for the number of
request dispatches, of the same request, to be balanced by an equal
number of request completions (while there will be one completion for
that request). In this state, BFQ may refuse to serve I/O requests
from other bfq_queues. The hang reported in [1] then follows.

This has been fixed in a7877390614770965a6925dfed79cbd3eeeb61e0 but was not marked for stable and does not apply cleanly to 4.15

Additional info:
http://lkml.org/lkml/2018/2/7/530
https://bugzilla.kernel.org/show_bug.cgi?id=198705
https://bbs.archlinux.org/viewtopic.php?id=234070
https://bbs.archlinux.org/viewtopic.php?id=234363
bfq2.patch is an attempt at a backport of a7877390614770965a6925dfed79cbd3eeeb61e0 it appears to resolve the issue
but should be treated more as a proof of concept than a suggested fix due to the amount of manual merging it required.

Steps to reproduce:
boot with scsi_mod.use_blk_mq=1
use a udev rule such as ACTION=="add|change", KERNEL=="sd*", ATTR{queue/scheduler}="bfq" to ensure scsi devices will use bfq
insert a usb thumb drive and wait for it to be detected
run blkid as root and the process will hang in the D state
This task depends upon

Closed by  Jan Alexander Steffens (heftig)
Wednesday, 14 February 2018, 01:52 GMT
Reason for closing:  Won't fix
Comment by Jan Alexander Steffens (heftig) - Wednesday, 14 February 2018, 01:52 GMT
blk-mq still has way too many issues and won't to be considered for patching. Please boot without use_blk_mq.

Alternatively, the 'bfq-mq' elevator in linux-zen is an up-to-date BFQ.

Loading...