FS#23725 - [kernel26] Since update to 2.6.38-2 udevs ata_id slows the system to a crawl

Attached to Project: Arch Linux
Opened by heiko (heiko) - Tuesday, 12 April 2011, 21:19 GMT
Last edited by Andrea Scarpino (BaSh) - Saturday, 30 April 2011, 11:33 GMT
Task Type Bug Report
Category Packages: Core
Status Closed
Assigned To Tobias Powalowski (tpowa)
Thomas Bächler (brain0)
Architecture x86_64
Severity High
Priority Normal
Reported Version
Due in Version Undecided
Due Date Undecided
Percent Complete 100%
Votes 2
Private No

Details

Description:
Since the update of kernel26 from 2.6.37-5 to 2.6.38-2 my PC (x86_64) is incredibly slow - bootup time multiplied by a factor of ten to twenty, maybe even more.

The slowdown begins during my initramfs udev module loading stage, when also a constant ticking sound of my CD-writer starts and the CD-writer does not open it's tray any more. The kernel spits out the following error message about five or six times per second and when the system eventually boots up, it is still pretty sluggish:

scsi2: Issued Channel A Bus Reset. 1 SCBs aborted
(scsi2:A:0:0): No or incomplete CDB sent to device.

The probable cause is udev's /lib/udev/ata_id, called by the rule tagged "ATA/ATAPI devices using the "scsi" subsystem" in /lib/udev/rules.d/60-persistent-storage.rules for my pure-SCSI CD-writer Plextor PX-W124TS on my Adaptec 2940-something SCSI-card using the aic7xxx driver.

I'm totally unsure whether this is a kernel bug (should abort somehow), a udev bug (should not call ata_id on a SCSI device) or a packaging bug (causing udev to call ata_id on a SCSI device in the first place) so I'm reporting it here first...

Downgrading the kernel26 back to 2.6.37-5 magically solves the problem. I'm not sure whether the same happens with i686, so I set the architecture field to x86_64 for now.

I am willing to test new kernel versions once they are out, patched versions of udev, anything. Might take a day or two though, time can be short. Might start some testing for myself tomorrow, such as disabling that particular udev rule for a start.

Regards, Heiko

Steps to reproduce:
1) Run a 2.6.38-2 kernel with initramfs on a system containing an Adaptec 2940 with a Plextor PX-W124TS attached.*
2) Watch it spit out error messages.

*My guess is that this is going to happen as well if you don't use an initramfs, only later during the "regular" bootup stage. Can't test due to encrypted root filesystem. And probably also on other SCSI cards, maybe even on other CD writers - can't test either due to lack of both...
This task depends upon

Closed by  Andrea Scarpino (BaSh)
Saturday, 30 April 2011, 11:33 GMT
Reason for closing:  Fixed
Additional comments about closing:  kernel26 2.6.38.3-1
Comment by Willie (bananabrain) - Friday, 15 April 2011, 13:20 GMT
Just to confirm - problems also present with i686.

Hardware - Intel D865GBF Mobo/Adaptec AHA2940AU/Plextor PX-40TSi.

We have the same SCSI config. Machine won't boot at all - just gets stuck in an endless repetition of:

scsi2: Issued Channel A Bus Reset. 1 SCBs aborted
(scsi2:A:0:0): No or incomplete CDB sent to device.

Previous kernel and Arch core ISO both will boot and work fine.

EDIT:
Swapped out AHA2940 for a Mylex Flashpoint LT (BT-930R) and all works well, suggesting the problem does lie with the aic7xxx driver in the 2.6.38 kernel.

EDIT 2:
Kernel 2.6.38.3 released today, one week after bug report. Problem has disappeared (for me). Excellent. Thanks Tobias/Thomas.
Comment by Arthur Huillet (ahuillet) - Saturday, 16 April 2011, 07:54 GMT
I have the very same symptoms (boot time increased by a huge factor), but no error messages in the console, and pressing a key (!) during bootup seems to speed it up. This is likely to be another issue.
Comment by Bill Fraser (wrf) - Wednesday, 20 April 2011, 08:38 GMT
I'm having identical symptoms to heiko (cd drive light blinks; won't open; errors spew to the console), and I just upgraded to 2.6.38.3-1 and am still having the issue.

# uname -a
Linux smokey 2.6.38-ARCH #1 SMP PREEMPT Sun Apr 17 14:51:34 UTC 2011 i686 Pentium III (Coppermine) GenuineIntel GNU/Linux

# pacman -Q kernel26
kernel26 2.6.38.3-1

The CD drive in question:

# lsscsi --verbose
[0:0:5:0] cd/dvd NEC CD-ROM DRIVE:466 1.06 /dev/sr0
dir: /sys/bus/scsi/devices/0:0:5:0 [/sys/devices/pci0000:00/0000:00:02.0/0000:01:06.0/host0/target0:0:5/0:0:5:0]
...

It's on an Adaptec aic7880 ultrascsi card, using driver aic7xxx.

From dmesg:

[ 19.297170] (scsi0:A:5:0): No or incomplete CDB sent to device.
[ 19.300480] scsi0: Issued Channel A Bus Reset. 1 SCBs aborted
(over and over again)

This continues until I hit alt-sysrq-E to terminate all tasks, and only then will my boot continue.

I can give more logs if needed.
Comment by Jelle van der Waa (jelly) - Wednesday, 20 April 2011, 08:41 GMT
did you guys ever check upstream or reported the bug?
Comment by Willie (bananabrain) - Wednesday, 20 April 2011, 23:36 GMT

Jelly,

This is the most intimate I've ever been with a kernel bug.

From my viewpoint a fairly serious breakage in the first arch 2.6.38 kernel was fixed with the subsequent 2.6.38.3 release.
I could find no mention of this aic7xxx problem in LKML and the checksums of all the aic7xxx source code in both releases are identical.

Apart from that I wouldn't know how else to investigate the issue. Any advice would be appreciated.
Comment by Tom Gundersen (tomegun) - Saturday, 23 April 2011, 00:47 GMT
This is a shot in the dark, but could this be related: <http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=7eec77a1816a7042591a6cbdb4820e9e7ebffe0e>?

ide: unexport DISK_EVENT_MEDIA_CHANGE for ide-gd and ide-cd

check_events() implementations in both ide-gd and ide-cd are
inadequate for in-kernel event polling. Both generate media change
events continuously when certain conditions are met causing infinite
event loop between the driver and userland event handler.

As disk event now supports suppression of unlisted events, simply
de-listing DISK_EVENT_MEDIA_CHANGE from disk->events resolves the
problem. Internal handling around media revalidation will behave the
same while userland will fall back to userland event polling after
detecting the device doesn't support disk events.
Comment by Frank Phillips (fphillips) - Sunday, 24 April 2011, 03:09 GMT
Tom: It's definitely not that commit, since it's only from two days ago and its in 2.6.39-rc4. You can tell by clicking Tree then Makefile.
Comment by Tom Gundersen (tomegun) - Sunday, 24 April 2011, 11:20 GMT
@Frank: I was suggesting that that commit might be addressing th issue (but I an probably wrong...).
Comment by heiko (heiko) - Sunday, 24 April 2011, 15:55 GMT
Willie: I can confirm that 2.6.38-3 fixes this :) Will request closure in a minute...

Jelle: No, I did not report this upstream. Frankly, I was hoping for someone else to do so and thereby spare me the trouble of registering with yet another bugzilla. (On a side note: This "subscribe to this mailing list, register for that bug tracker, join yet another Yahoo Group" is not helping in getting bugs reported to the correct places. I have no better plan available, just don't like the way it is...)

Was about to try now when I noticed bugzilla.kernel.org is down and it looks like the bug disappeared in 2.6.38-3 anyway.

Tom: I don't think this is related because my system is not using any ide-cd-drivers (running on pure SCSI that is) I can't rule it out, though.

Regards, Heiko
Comment by Tom Gundersen (tomegun) - Tuesday, 26 April 2011, 17:27 GMT
@Bill: Is the bug still present with 2.6.38.4?
Comment by Bill Fraser (wrf) - Tuesday, 26 April 2011, 17:34 GMT
I'll try it tonight. *crosses fingers*

Loading...