FS#42692 - [linux] random freeze during boot with 3.17
Attached to Project:
Arch Linux
Opened by patrick (potomac) - Wednesday, 05 November 2014, 18:26 GMT
Last edited by freswa (frederik) - Saturday, 26 September 2020, 22:56 GMT
Opened by patrick (potomac) - Wednesday, 05 November 2014, 18:26 GMT
Last edited by freswa (frederik) - Saturday, 26 September 2020, 22:56 GMT
|
Details
Description:
I notice a random freeze every 5~10 boots with systemd 216-3 ( I tested also systemd 217-5 in testing, it's the same problem ), every 5~10 boots a freeze can occur shortly after the load of the kernel, I can see these messages on screen and then nothing, it seems like a freeze : :: running early hook [udev] :: running hook [udev] :: Triggering uvents... sometimes the freeze happens a few seconds after systemd starts ( after the message "mount /home" for example ) 5 minutes after the freeze systemd displays this message : task systemd-udevd:236 blocked for more than 120 seconds and systemd is still unable to boot, so I have to do a reset of my PC, I tested also with kernel 3.18rc3 and the bug is still here, random freeze/stop at boot with systemd my configuration : archlinux 64 bits cpu pentium dual core E6800 3.33 Ghz ati radeon HD4650 PCIe ( radeon open source driver, KMS early start ) I tried to disable "kernel mode setting" but the bug is still here, I created a bug report in systemd website but I don't have answers from the developpers, I created also a bug report in kernel's bugzilla but again I don't have answers, I took a screenshot when systemd displays the error message after a freeze Additional info: * package version(s) systemd 216-3 ( same problem with systemd 217-5 ), kernel 3.17-2 ( tested also with kernel 3.18rc3 ) * config and/or log files etc. Steps to reproduce: - start your PC - sometimes a random freeze can occur at boot ( every 5~10 boots, it's hard to trigger the bug ) - 5 minutes after the freeze systemd displays this error message : task systemd-udevd:236 blocked for more than 120 seconds |
This task depends upon
FS#42509? Same results with kernel 3.16.x?it's not like the bug
FS#42509because I don't use EFI, my motherboard only supports "bios" ( it's an old motherboard ),it could be a kernel bug, but it may be also a problem with systemd if systemd 216-3 is not fully compatible with the new features of kernel 3.17.x
No, that's not how kernel development works. Newer kernels *must* work with older userspace, or else it's considered a regression in the kernel. The opposite does not hold true, but that's a "feature" of being a userspace developer: newer userspace may not work properly with older kernels.
the kernel 3.17 ?
the problem occurs also with kernel 3.18,
I tried to disable intel-ucode in grub --> same problem, so it's not a problem with intel-ucode/microcode
:: running hook [udev]
:: Triggering uevents...
worker [53] /devices/pci0000:00/0000:00:1f.2/ata8/host7/target7:0:1/7:0:1:0 is taking a long time
worker [60] /devices/pci0000:00/0000:00:1f.2/ata7/host6/target6:0:0/6:0:0:0/block/sr0 is taking a long time
https://bugzilla.kernel.org/attachment.cgi?id=156871
https://bbs.archlinux.org/viewtopic.php?pid=1473209#p1473209
it seems a race condition during boot who can trigger a random hang
the first bad commit is :
first bad commit: [045065d8a300a37218c548e9aa7becd581c6a0e8] [SCSI] fix qemu boot hang problem
http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=045065d8a300a37218c548e9aa7becd581c6a0e8
diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 9c44392..ce62e87 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -1774,7 +1774,7 @@ static void scsi_request_fn(struct request_queue *q)
blk_requeue_request(q, req);
atomic_dec(&sdev->device_busy);
out_delay:
- if (atomic_read(&sdev->device_busy) && !scsi_device_blocked(sdev))
+ if (!atomic_read(&sdev->device_busy) && !scsi_device_blocked(sdev))
blk_delay_queue(q, SCSI_QUEUE_DELAY);
}
I rebuilt kernel 3.17.2 with this patch and now the bug is gone, no problems with my PC
there is another way to solve my bug, with this patch :
--- a/drivers/scsi/scsi_lib.c 2014-10-05 21:23:04.000000000 +0200
+++ b/drivers/scsi/scsi_lib.c 2014-11-16 17:39:16.819674725 +0100
@@ -1776,7 +1776,7 @@ static void scsi_request_fn(struct reque
atomic_dec(&sdev->device_busy);
out_delay:
if (!atomic_read(&sdev->device_busy) && !scsi_device_blocked(sdev))
- blk_delay_queue(q, SCSI_QUEUE_DELAY);
+ blk_delay_queue(q, 40);
}
static inline int prep_to_mq(int ret)
it gives a little more time ( 40 ms instead of 3 ms, which is the default value for SCSI_QUEUE_DELAY )
I found that the element who triggers the bug ( random hang at boot with
kernel 3.17 and 3.18 ) is the combination of 3 elements :
- the use of a SATA DVD burner ( Liteon iHAS124 C ) on a ICH7 Sata controler
- the use of a gigabyte motherboard GA-P31-DSL3 ( bios F10A, ICH7
controler, intel P31 chipset )
- commit 74665016086615bbaa3fa6f83af410a0a4e029ee ( scsi: convert
host_busy to atomic_t )
If I connect this Sata DVD burner and a sata harddisk by using the SATA
ports of the motherboard then the bug will occur ( but the bug will
occur only on kernels 3.17 and 3.18, there is no problems with older
kernels, and no problems with Windows 7 )
If I disconnect the SATA DVD burner then the bug is gone, no problems
with kernels 3.17 and 3.18,
And if I connect the SATA DVD burner on my JMicron SATA/IDE PCIe card
then there is no problem, no bugs, this is a perfect workaround for my
problem, because I can use kernel 3.17/3.18 without problem with this
configuration.
But I don't know which element I should blame, my gigabyte motherboard ?
( faulty bios ? ) The use of "atomic_t" in scsi source code ? (
innapropriate way to handle SATA devices, it breaks compatibility with
some PC configurations ? )
with kernel 3.17 the bug is trigerred if a slow sata device (like dvd driver) is set on a ICH7 Sata controler, I don't know if this problem is solved with a recent kernel linux
Upstream Ticket:
https://bugzilla.kernel.org/show_bug.cgi?id=87581
for me I solved the problem by using a sata controler pci-e card, I connected on this card my sata dvd burner, I think the bug is triggered by slow sata devices like dvd burner when it is connected on sata ports located on the motherboard (for some models like gigabyte P31, P35 chipsets)