FS#40454 - [linux] 3.13.x - 3.15.x mptscsi taking longer than 30 seconds to probe resulting in boot fail

Attached to Project: Arch Linux
Opened by Jason Begley (jayray) - Monday, 19 May 2014, 18:00 GMT
Last edited by Dave Reisner (falconindy) - Saturday, 13 September 2014, 14:14 GMT
Task Type Bug Report
Category Upstream Bugs
Status Closed
Assigned To Tobias Powalowski (tpowa)
Thomas Bächler (brain0)
Architecture x86_64
Severity Critical
Priority Normal
Reported Version
Due in Version Undecided
Due Date Undecided
Percent Complete 100%
Votes 0
Private No

Details

Description: [linux] mptscsi taking longer than 30 seconds to probe resulting in boot fail.
After kernel upgrade to 3.13+ kernel OOPS at ~30sec mark due to timeout waiting on MPTSCSI.


Additional info:
* Any kernel 3.13+
* Hardware Dell 1950 PERC6


Steps to reproduce:
Hardware specific. Matching hardware/kernel version duplicates findings. This issues are detailed in ->
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1276705
This task depends upon

Closed by  Dave Reisner (falconindy)
Saturday, 13 September 2014, 14:14 GMT
Reason for closing:  Upstream
Additional comments about closing:  Nothing for Arch to do here. See upstream discussion about udev timeouts and kernel regressions:

http://lists.freedesktop.org/archives/sy stemd-devel/2014-September/022923.html
Comment by Jason Begley (jayray) - Monday, 26 May 2014, 04:02 GMT

Additional info: This seems to be an issue with both Kernel 3.13+ and systemd.

Excerpts from https://bugs.launchpad.net/ubuntu/+source/systemd/+bug/1297248

(1) Currently finit_module() of mptsas kernel module does need more than
30 seconds to initialize LSI SAS1068E disk.

(2) Currently systemd-udevd unconditionally sends SIGKILL upon hardcoded
30 seconds timeout. As a result, finit_module() of mptsas kernel
module receives SIGKILL when waiting for error handler thread to be
started.
(3) Before commit 786235ee was applied, finit_module() receiving SIGKILL
was no problem because kthread_create() ignored SIGKILL when waiting
for error handler thread to be started. But after commit 786235ee was
applied, finit_module() receiving SIGKILL is a problem because
kthread_create() no longer ignores SIGKILL when waiting for error
handler thread to be started. As a result, finit_module() of mptsas
kernel module failed to initialize LSI SAS1068E disk, leading to
a boot failure.

Commit 786235ee was meant for helping OOM killer to terminate the victim
process immediately when the victim process is unable to be terminated
due to waiting for kthreadd process to complete memory allocation.

Kernel developers think that it is a systemd's bug because any thread
who received SIGKILL has a right to terminate immediately. Therefore,
reverting commit 786235ee is not acceptable for kernel developers.

On the other hand, systemd developers think that it is a kernel's bug
because finit_module() should return within 30 seconds. Therefore,
changing to longer timeout is not acceptable for systemd developers.

Since there was no time to wait for systemd to allow longer timeout,
Bug #1276705 used a SAUCE patch that allows kthread_create() to ignore
SIGKILL up to 10 seconds. We used a SAUCE patch for Ubuntu 14.04, but
we don't want to carry this SAUCE patch forever.
Comment by Jason Begley (jayray) - Tuesday, 27 May 2014, 01:07 GMT
I applied a fix used by the Debian Kernel maintainers to fix this issue in our kernel. See attached for patch and modified PKGBUILD used. Tested with 3 reboots for certainty. Could we see this in the next release?
Comment by Dave Reisner (falconindy) - Tuesday, 27 May 2014, 01:19 GMT
> Could we see this in the next release?
Why isn't this patch upstream?
Comment by Jason Begley (jayray) - Tuesday, 27 May 2014, 02:24 GMT
Mptscsi works fine except for the long delay in probing or whatever it is doing to exceed the 30 second time limit newly implemented by udev/systemd. So to me it is unclear which is to blame, systemd or kernel. I think the new behavior in systemd has uncovered the issue, but it shouldn't be allowed to kill the probe/mptscsi all together. Please see the above comments taken from Ubuntu with the same issues. This was their fix.

Loading...