FS#25474 - [linux] Loading aacraid takes more than 60 seconds
Attached to Project:
Arch Linux
Opened by Joker-jar (Joker-jar) - Tuesday, 09 August 2011, 08:27 GMT
Last edited by Tom Gundersen (tomegun) - Friday, 18 November 2011, 13:44 GMT
Opened by Joker-jar (Joker-jar) - Tuesday, 09 August 2011, 08:27 GMT
Last edited by Tom Gundersen (tomegun) - Friday, 18 November 2011, 13:44 GMT
|
Details
Ramdisk hangs during the boot. After ~1 minute displays
message: "udevd[108]: timeout: killing: ..." (see attached
image). This problem appears after upgrade to 3.0 kernel.
Fallback image has this problem too.
|
This task depends upon
Closed by Tom Gundersen (tomegun)
Friday, 18 November 2011, 13:44 GMT
Reason for closing: None
Additional comments about closing: Work's for user, patch exists.
Friday, 18 November 2011, 13:44 GMT
Reason for closing: None
Additional comments about closing: Work's for user, patch exists.
archboot.jpg
Btw is there anything special in your MODULES or modprobe.conf
The module in question has alias pci:v00009005d00000285sv00009005sd000002D5bc01sc04i00 and the module name is aacraid.
It appears the system can boot without this module being loaded. Maybe it would be worth adding --exit-if-exists=<whatever node we need to continue> to udevadm settle, so we don't wait for unnecessary stuff?
@Joker-jar:
Please post your /etc/mkinitcpio.conf. Also: are you up to date with the latest packages from [core], or do you have some custom packages (in particular custom kernel)?
Anyway, this is likely a kernel bug, dmesg will tell.
$ lspci | grep -i adaptec
01:00.0 RAID bus controller: Adaptec AAC-RAID (rev 09)
P.S. Sorry for my english :)
(In order to rmmod you probably have to mount from a livecd, or get mount into sh in your initramfs, so that your hdd will not be mounted).
$ /lib/initcpio/busybox rmmod
BusyBox v1.18.4 (2011-05-14 12:38:46 CEST) multi-call binary.
Usage: rmmod [-wfa] [MODULE]..
in addition:
$ lsmod | grep aacraid
aacraid 79005 4
scsi_mod 131482 7 usb_storage,uas,sg,sd_mod,sr_mod,libata,aacraid
I guess livecd is no good. So booting into the initrd is probably the best option. Not sure what is the best way, but I'd do something like this: add rdinit=/bin/sh to your kernel command line. When booted mount /proc and /sys and then call modprobe to see what happens.
Kernel panic - not syncing: VFS: unable to mount root fs on unknown_block(1,0)
rdinit=/bin/busybox sh
there is no /bin/sh on the initramfs prior to running /bin/busybox --install -s (which is the first thing that's called in /init).
Probleam appears after this update (it is 100%): http://pastebin.com/tKwTadHv
To reproduce, maybe the best thing is to boot with the most recent arch iso, as it should have linux 3.0: http://releng.archlinux.org/isos/2011.08.11/archlinux-2011.08.11-netinstall-x86_64.iso .
Than i tried to boot with param "modprobe.blacklist=aacraid". System loads quickly but raid partitions, of course, were not available.
# modprobe -bv pci:v00009005d00000285sv00009005sd000002D5bc01sc04i00
Module pci:v00009005d00000285sv00009005sd000002D5bc01sc04i00 not found.
# time modprobe aacraid
real 1m1.673s
user 0m0.100s
sys 0m0.003s
Your timing is now a bit more than one second over the timeout (60 sec), so it would be good to know how big the difference was (if it was 1 sec or 59 secs before the upgrade).
Would be nice if you could reproduce the timing numbers (with good and bad kernel) now that your BIOS is updated, and post to the kernel bugzilla.
Could you post a link to your bug report?
https://bugzilla.kernel.org/show_bug.cgi?id=40932
I think replacing
DRIVER!="?*", ENV{MODALIAS}=="?*", RUN+="/sbin/modprobe -bv $env{MODALIAS}"
in /lib/udev/rules.d/80-drivers.rules, with
DRIVER!="?*", ENV{MODALIAS}=="?*", RUN+="/sbin/modprobe -bv $env{MODALIAS}", OPTIONS+="event_timeout=60"
(where 60 is replaced with the number of seconds you want).
I have not tested this so maybe it is not exactly correct, have a look in "man udev" for more info.
Out of curiosity, I don't think I have seen the time measurement on a working kernel. If it is not too much effort, could you paste the output of "# time modprobe aacraid" after booting a working kernel with "modprobe.blacklist=aacraid"?
I'll try "# time modprobe aacraid" with working kernel as soon as i'll buy blank CD :)
i run kexec daemon and try to reboot. System hangs with this messages:
Triggering uevents...
irq 19: nobody cared (...)
handlers:
usb_hcd_ing
usb_hcd_ing
disabling IRQ 19
And after that (~1 min) message from screenshot (timeout killing modprobe). IRQ 19? May be this is conflict?
commit cf16123c9c8e346ed1dd171295a678d77648d7f8
Author: Vasily Averin <vvs@parallels.com>
Date: Fri Nov 11 13:42:16 2011 +0400
[SCSI] aacraid: controller hangs if kernel uses non-default ASPM policy
Aacraid controller can hang on some nodes if kernel uses non-default
(powersave) ASPM policy. Controller hangs shortly after successful load and
hardware detection. Scsi error handler detects this hang and tries to restart
hardware but it does not help.
Initially it was noticed on RHEL6-based openVZ kernel after backporting
aacraid driver from mainline (RHEL6 kernel with original driver works well)
http://bugzilla.openvz.org/show_bug.cgi?id=2043
This issue happens because default ASPM policy was changed in Red Hat
kernels. Therefore guys from Red Hat have noticed this problem long time ago:
on Fedora 12
https://bugzilla.redhat.com/show_bug.cgi?id=540478
on Fedora 14
https://bugzilla.redhat.com/show_bug.cgi?id=679385
In RHEL6 kernel this issue was fixed, ASPM was disabled in aacraid driver. In
kernel changelog I've found that seems it was done by Matthew Garrett: -
[scsi] aacraid: Disable ASPM by default (Matthew Garrett) [599735]
However seems this patch was not submitted to mainline. I've reproduced this
issue on vanilla 3.1.0 kernel booted with "pcie_aspm.policy=powersave" option,
So I believe it makes sense to do it now.
Signed-off-by: Vasily Averin <vvs@sw.ru>
[mjg: Checking the Windows drivers indicates that they disable ASPM under all
circumstances, so:]
Acked-by: Matthew Garrett <mjg@redhat.com>
Acked-by: Achim Leubner <Achim_Leubner@pmc-sierra.com>
Cc: stable@kernel.org
Signed-off-by: James Bottomley <JBottomley@Parallels.com>