FS#20499 - Kernel Update 2.6.34.2-2 -> 2.6.34.3-1 nvraid dmraid won't boot - LTS still OK

Attached to Project: Arch Linux
Opened by David C. Rankin (drankinatty) - Tuesday, 17 August 2010, 19:45 GMT
Last edited by Gaetan Bisson (vesath) - Saturday, 21 August 2010, 01:21 GMT
Task Type Bug Report
Category Packages: Core
Status Closed
Assigned To No-one
Architecture x86_64
Severity Critical
Priority Normal
Reported Version
Due in Version Undecided
Due Date Undecided
Percent Complete 100%
Votes 0
Private No

Details


After update from (2.6.34.2-2 -> 2.6.34.3-1) my test server failed to boot. The boot process stopped almost immediately with the following at the top of the screen:

Booting 'Arch Linux on Archangel'

root (hd1,5)
Filesystem Type ext2fs, partition type 0x83
kernel /vmlinuz26 root=/dev/mapper/nvidia_baaccajap5 ro debug
[Linux-BzImage, setup=0x3200, size=0x1ff440]

The box would sit there for 30-60 seconds and then automatically reboot?? So I booted back into LTS which worked just fine.

The complete update log is at http://www.3111skyline.com/dl/Archlinux/bugs/pm-updt-8-16.txt

The server is based on a MSI K9N2 board (MS-7374) with a Phenom 9850 proc & 8G of ram. The box has 2 dmraid arrays:

[22:00 ecstasy:/mnt/arch] # dmraid -r
/dev/sdd: nvidia, "nvidia_baaccaja", mirror, ok, 1465149166 sectors, data@ 0
/dev/sdc: nvidia, "nvidia_fdaacfde", mirror, ok, 976773166 sectors, data@ 0
/dev/sdb: nvidia, "nvidia_baaccaja", mirror, ok, 1465149166 sectors, data@ 0
/dev/sda: nvidia, "nvidia_fdaacfde", mirror, ok, 976773166 sectors, data@ 0

[22:00 ecstasy:/mnt/arch] # dmraid -s
*** Active Set
name : nvidia_baaccaja
size : 1465149056
stride : 128
type : mirror
status : ok
subsets: 0
devs : 2
spares : 0
*** Active Set
name : nvidia_fdaacfde
size : 976773120
stride : 128
type : mirror
status : ok
subsets: 0
devs : 2
spares : 0

The lspci information for the soft-raid controller is:

00:09.0 RAID bus controller: nVidia Corporation MCP78S [GeForce 8200] SATA Controller (RAID mode) (rev a2)
Subsystem: Micro-Star International Co., Ltd. Device 7374
Flags: bus master, 66MHz, fast devsel, latency 0, IRQ 28
I/O ports at b080 [size=8]
I/O ports at b000 [size=4]
I/O ports at ac00 [size=8]
I/O ports at a880 [size=4]
I/O ports at a800 [size=16]
Memory at f9e76000 (32-bit, non-prefetchable) [size=8K]
Capabilities: [44] Power Management version 2
Capabilities: [8c] SATA HBA v1.0
Capabilities: [b0] MSI: Enable+ Count=1/8 Maskable- 64bit+
Capabilities: [ec] HyperTransport: MSI Mapping Enable+ Fixed+
Kernel driver in use: ahci
Kernel modules: ahci

After the box kept automatically rebooting itself, I tried rebuilding the initramfs with:

/sbin/mkinitcpio -k 2.6.34-ARCH -c /etc/mkinitcpio.conf -g /boot/kernel26.img

No change. Guessing, I added ahci to the MODULES line in /etc/mkinitcpio.conf and tried again. No change, the box still reboots itself.

Getting no where, I decided to go ahead and try the update on my main server (Tyan Computer Tomcat K8E (S2865), nv_sata and dmraid) since it is similar in config to see if I could confirm this bug -- the update to 2.6.34 on my normal server worked. What? The raid setup and lspci info on the normal box is:

[12:57 nirvana:/home/david] # dmraid -r
/dev/sdb: nvidia, "nvidia_ddddhhfh", mirror, ok, 1465149166 sectors, data@ 0
/dev/sda: nvidia, "nvidia_ddddhhfh", mirror, ok, 1465149166 sectors, data@ 0
[12:57 nirvana:/home/david] # dmraid -s
*** Active Set
name : nvidia_ddddhhfh
size : 1465149056
stride : 128
type : mirror
status : ok
subsets: 0
devs : 2
spares : 0

lspci info:

00:07.0 IDE interface: nVidia Corporation CK804 Serial ATA Controller (rev f3) (prog-if 85 [Master SecO PriO])
Subsystem: Tyan Computer Tomcat K8E (S2865)
Flags: bus master, 66MHz, fast devsel, latency 0, IRQ 23
I/O ports at 09f0 [size=8]
I/O ports at 0bf0 [size=4]
I/O ports at 0970 [size=8]
I/O ports at 0b70 [size=4]
I/O ports at d400 [size=16]
Memory at febfc000 (32-bit, non-prefetchable) [size=4K]
Capabilities: [44] Power Management version 2
Kernel driver in use: sata_nv
Kernel modules: ata_generic, pata_acpi, sata_nv, ide-pci-generic
00:08.0 IDE interface: nVidia Corporation CK804 Serial ATA Controller (rev f3) (prog-if 85 [Master SecO PriO])
Subsystem: Tyan Computer Tomcat K8E (S2865)
Flags: bus master, 66MHz, fast devsel, latency 0, IRQ 22
I/O ports at 09e0 [size=8]
I/O ports at 0be0 [size=4]
I/O ports at 0960 [size=8]
I/O ports at 0b60 [size=4]
I/O ports at c000 [size=16]
Memory at febfb000 (32-bit, non-prefetchable) [size=4K]
Capabilities: [44] Power Management version 2
Kernel driver in use: sata_nv
Kernel modules: ata_generic, pata_acpi, sata_nv, ide-pci-generic

The big difference I see is that my main server 'nirvana' uses:

Kernel driver in use: sata_nv
Kernel modules: ata_generic, pata_acpi, sata_nv, ide-pci-generic

which is working just fine with 2.6.34.3 while the test box 'archangel' uses:

Kernel driver in use: ahci
Kernel modules: ahci

So there is a bug somewhere in 2.6.34.3 in the way it handles nvraid dmraid where the soft-raid controller relies on the ahci module. For setups where nvraid dmraid uses the nv_sata module, dmraid works just fine. Let me know what other information you need to see or what other tests you want me to run. How the LTS kernel boots just fine on the box, but 2.6.34.3 won't is a mystery. Where that difference is, is where the bug lies....

This task depends upon

Closed by  Gaetan Bisson (vesath)
Saturday, 21 August 2010, 01:21 GMT
Reason for closing:  None
Additional comments about closing:  partially fixed by upstream upgrade;
for problems with 2.6.25.2, please create another bug report

Loading...