FS#22343 - Random kernel panics after udev upgrade

Attached to Project: Arch Linux
Opened by H.pferd (stosch) - Thursday, 06 January 2011, 20:32 GMT
Last edited by Tobias Powalowski (tpowa) - Monday, 24 January 2011, 18:46 GMT
Task Type Bug Report
Category Packages: Core
Status Closed
Assigned To Tobias Powalowski (tpowa)
Architecture All
Severity High
Priority Normal
Reported Version
Due in Version Undecided
Due Date Undecided
Percent Complete 100%
Votes 49
Private No

Details

Description: Since a few days (probably after the udev upgrade to 165) I discover random kernel panics on several machines (Laptop and Desktop). Those occur in two situations:
-When I insert a DVD (no automount)
-rarely when I boot up.
There might be a connection to this forum thread: https://bbs.archlinux.org/viewtopic.php?id=111197
even though there is a slightly different display output (see attached file).
The problem seems to be solved when downgrading to older udev164-3.


Additional info:
udev 165-1
picture with display output


Steps to reproduce:
non found - random but pretty often
   panic.JPG (513.2 KiB)
This task depends upon

Closed by  Tobias Powalowski (tpowa)
Monday, 24 January 2011, 18:46 GMT
Reason for closing:  Fixed
Additional comments about closing:  patch added to all kernels
Comment by Artem A Klevtsov (unikum) - Thursday, 06 January 2011, 21:41 GMT
On my laptop appears kernel panics on boot with message: "kernel panic not syncing: fatal exception in interrupt". I don't know reason yet.
Comment by Balló György (City-busz) - Friday, 07 January 2011, 06:48 GMT
I also get a similar error sometimes after udev updated to 165-1, when my USB hard drive attached to the system during boot. I have to disconnect my drive and reset to boot again.
Comment by Abdullah Özkan (soundman) - Friday, 07 January 2011, 11:37 GMT
I have same problem both laptop and desktop computer. It's random, after several resets booting normally.
Comment by H.pferd (stosch) - Friday, 07 January 2011, 12:43 GMT Comment by rubble (rubble) - Friday, 07 January 2011, 19:48 GMT
I have the same problem with udev 165-1 when booting with the main image. Fallback image okay.
Comment by Angelo Platti (berseker) - Saturday, 08 January 2011, 09:00 GMT
I confirm this bug on my testing system. It happens randomly only at boot time to me.
Comment by David (amen) - Saturday, 08 January 2011, 10:43 GMT
I have same error. It happens randomly.
Comment by andres andres (madek) - Saturday, 08 January 2011, 18:48 GMT
sam error here, ramdomly kernel panic at udev autodetect process, fallback image works normal
Comment by andrews (andrews) - Saturday, 08 January 2011, 19:20 GMT
Confirmed kernel panic with libata stuff, downgrading to udev 164-3 and no more panics.
Comment by Tobias Powalowski (tpowa) - Saturday, 08 January 2011, 19:24 GMT
Have you tried the following:
udevadm info --convert-db
and rebuild your initramfs by running mkinitcpio -p kernel26
Comment by andrews (andrews) - Sunday, 09 January 2011, 05:51 GMT
After tried udev 165-1 before and after reboot, with
udevadm info --convert-db and updated the initramfs in both kernels26 2.6.36 and 2.6.37 from testing ,
login to my desktop, put a dvd on the drive and panic again,sometimes panic on boot.
Downgrading to udev 164-3 is the only workaround that helps for now.
Comment by Abdullah Özkan (soundman) - Sunday, 09 January 2011, 07:32 GMT
@tpowa,
I tried with the stock kernel. The problem continues.
Comment by biginoz (biginoz) - Sunday, 09 January 2011, 08:10 GMT
Confirmed
same error here, ramdomly kernel panic at udev autodetect process, fallback image works normal
Comment by Emmanuel (oktoberfest) - Sunday, 09 January 2011, 13:18 GMT
After upgrading to udev 165-1 I got kernel panic panic at each boot. So I downgraded to udev 164-3.
According to Tobias, here is what I've done to solve the problem :
- upgrade to udev 165-1
- udevadm info --convert-db
- boot many times. No kernel panic
- rebuild initramfs
- boot many times. No kernel panic

So for me the problem is gone. I do not use my Arch to play DVD, so I haven't check this case.
Comment by DaNiMoTh (DaNiMoTh) - Sunday, 09 January 2011, 14:34 GMT
Attached a workaround for udev, which doesn't call the bugged code (probably in the kernel)

Comment by mikey rabid (rabid_works) - Sunday, 09 January 2011, 14:45 GMT
I hadn't had any kernel panics since the 6th of January until this morning. I have tried running udevadm info --convert-db as suggested in the above comments, follwed by mkinitcpio -p kernel26. I then rebooted and was greeted by the same freeze. After running mkinitcpio, the fallback image also froze on me. I have downgraded to the previous version of udev for now.

@Emmanuel : are you 100% sure that the problem is gone ? It can be sporadic.
Comment by H.pferd (stosch) - Sunday, 09 January 2011, 16:21 GMT
Problem still occurs.
Comment by Mario Kozjak (archman-cro) - Monday, 10 January 2011, 10:14 GMT
Here is my post: https://bbs.archlinux.org/viewtopic.php?pid=876586#p876586

Basically, it doesn't get stuck when I use udev 164-3 with stock ARCH kernel, and still stucks while trying to boot with a bfs-patched kernel.

@stosch: Using bfs by any chance?

Update: The system definitely doesn't get stuck when trying to boot kernel without bfs.
Comment by Emmanuel (oktoberfest) - Monday, 10 January 2011, 20:38 GMT
@mikey rabid : I started my arch this evening and... kernel panic again :(
So the problem still occurs.
Comment by Fabio Francisco Domingues (fdomingues) - Tuesday, 11 January 2011, 10:42 GMT
I have the same error, that occurs at boot time only. Fallback image is OK and after a fallback boot I can boot using the standard image again.

I´ve cleaned my pkg cache. Someone know where can I download the udev 164 package? Thanks!
Comment by Balló György (City-busz) - Tuesday, 11 January 2011, 10:54 GMT
Here you can download it:
i686: http://schlunix.org/archlinux/core/os/i686/udev-164-3-i686.pkg.tar.xz
x86_64: http://schlunix.org/archlinux/core/os/x86_64/udev-164-3-x86_64.pkg.tar.xz

But I think it would be better if Arch devs move back 165-1 to [testing] until we don't know what caused this critical bug for many users.
Comment by Fabio Francisco Domingues (fdomingues) - Tuesday, 11 January 2011, 13:39 GMT
Thank you. I seems that udev 164 solved the problem!
Comment by Alex (Alex Arch User) - Tuesday, 11 January 2011, 15:15 GMT
Just subscribing...
Comment by Nagy Gabor (combo) - Tuesday, 11 January 2011, 21:53 GMT
Same issue. Fallback image also freezes here.
Comment by Kim (dedanna1029) - Wednesday, 12 January 2011, 06:57 GMT
I just performed the kernel upgrade. I did not get any errors or panics after upgrading udev to 165-1 initially back on Dec. 16, but with the new kernel update that I just updated to this evening, I now get the kernel panic. I myself can not say concretely that it's udev, therefore. I think it may be a kernel issue, either that, or something with the new kernel isn't working with the latest udev.

Right after I updated this evening, I rebooted, and came up to a string of code, all of it being greek to me, and a panic right there. I rebooted, tried the fallback image, and it is working fine so far. I am afraid to do anything further with my system, as I'm afraid I'll totally bork it - I'm going to keep my system up solidly for the next couple of days, and see if there is a kernel fix that comes in - if not, then I'll downgrade udev and the kernel both to be on the safe side.

Thanks for hearing me out.

My system:

lsmod
Module Size Used by
usb_storage 34044 1
fuse 56816 3
lm85 13885 0
hwmon_vid 2008 1 lm85
ext2 55656 1
mbcache 4298 1 ext2
nvidia 9234898 28
snd_seq_dummy 1079 0
snd_seq_oss 25040 0
snd_seq_midi_event 4528 1 snd_seq_oss
snd_seq 41688 5 snd_seq_dummy,snd_seq_oss,snd_seq_midi_event
snd_pcm_oss 33694 0
snd_emu10k1 124455 1
snd_intel8x0 22230 2
snd_rawmidi 15288 1 snd_emu10k1
snd_mixer_oss 14654 1 snd_pcm_oss
snd_ac97_codec 87943 2 snd_emu10k1,snd_intel8x0
ac97_bus 762 1 snd_ac97_codec
snd_pcm 59136 4 snd_pcm_oss,snd_emu10k1,snd_intel8x0,snd_ac97_codec
firewire_ohci 23548 0
snd_seq_device 4369 5 snd_seq_dummy,snd_seq_oss,snd_seq,snd_emu10k1,snd_rawmidi
e100 27295 0
mii 3198 1 e100
firewire_core 42849 1 firewire_ohci
snd_timer 15583 3 snd_seq,snd_emu10k1,snd_pcm
snd_util_mem 1820 1 snd_emu10k1
agpgart 22816 1 nvidia
crc_itu_t 1053 1 firewire_core
snd_hwdep 4764 1 snd_emu10k1
uhci_hcd 19091 0
snd 43219 18 snd_seq_oss,snd_seq,snd_pcm_oss,snd_emu10k1,snd_intel8x0,snd_rawmidi,snd_mixer_oss,snd_ac97_codec,snd_pcm,snd_seq_device,snd_timer,snd_hwdep
soundcore 4929 1 snd
iTCO_wdt 8677 0
i2c_i801 6946 0
snd_page_alloc 5981 3 snd_emu10k1,snd_intel8x0,snd_pcm
ehci_hcd 32908 0
ppdev 4862 0
parport_pc 27832 1
shpchp 23037 0
psmouse 49765 0
lp 6652 0
iTCO_vendor_support 1433 1 iTCO_wdt
pci_hotplug 21523 1 shpchp
usbcore 115866 4 usb_storage,uhci_hcd,ehci_hcd
i2c_core 15762 3 lm85,nvidia,i2c_i801
parport 25499 3 ppdev,parport_pc,lp
pcspkr 1359 0
serio_raw 3566 0
thermal 9690 0
evdev 6692 7
processor 22776 0
button 3746 0
reiserfs 225719 3
sg 21028 0
sd_mod 24384 7
sr_mod 13217 0
cdrom 31378 1 sr_mod
pata_acpi 2308 0
ata_generic 2215 0
ata_piix 17935 4
libata 140308 3 pata_acpi,ata_generic,ata_piix
scsi_mod 106955 5 usb_storage,sg,sr_mod,sd_mod,libata

uname -a
Linux dedanna.rocks.net 2.6.36-ARCH #1 SMP PREEMPT Sat Jan 8 13:16:43 UTC 2011 i686 Intel(R) Pentium(R) 4 CPU 2.40GHz GenuineIntel GNU/Linux

Oh, btw, meant to mention also that my FreeAgent external usb drive was NOT hotplugged when I updated this evening. Normally I do leave it hotplugged, and it does fine through kernel updates, etc., but I had received a heads up from our site admin at our forum, so I did unmount it this time before I updated - start reading here: http://bjoernvold.com/forum/viewtopic.php?p=6826#p6826
Comment by Kim (dedanna1029) - Wednesday, 12 January 2011, 07:01 GMT
If you have any further questions, feel free to ask. I am more than willing to do whatever I can to see this resolved.

Thanks.
Comment by Kim (dedanna1029) - Wednesday, 12 January 2011, 07:02 GMT Comment by Alex (Alex Arch User) - Wednesday, 12 January 2011, 16:10 GMT
Does it affect Gentoo systems? Any Gentoo users?
Comment by Jelle van der Waa (jelly) - Wednesday, 12 January 2011, 16:15 GMT
check the bugtracker of gentoo / forums is you want to know if it affects them
Comment by Gavin Bisesi (Daenyth) - Wednesday, 12 January 2011, 18:44 GMT
I have this problem as well on every boot.

This needs a news post. I'm glad I'm working from home today or I'd be totally hosed. People shouldn't have to check the bug tracker to know if they will have a usable system after upgrading.

Downgrading to udev 164 resolves the issue
Comment by Leonid Isaev (lisaev) - Wednesday, 12 January 2011, 19:23 GMT
> People shouldn't have to check the bug tracker to know if they will have a usable system after upgrading

Well, apparently the issue is illusive... On my 2 i686 and 2 x86_64 systems, I haven't experienced any crashes since Jan 5 (and two of them work 24/7). No problems with DVD under KDE x64 either. Apparently I wasn't lucky ehough regarding my hardware :)
Comment by Kim (dedanna1029) - Wednesday, 12 January 2011, 23:49 GMT
> People shouldn't have to check the bug tracker to know if they will have a usable system after upgrading

And I experienced nothing after the udev update (which I got on Dec. 16th) until the kernel update last night. I am still suspicious that it's actually the kernel's fault, that it's not able to work with current udev. But however you want to look at that I guess.
Comment by Kim (dedanna1029) - Thursday, 13 January 2011, 00:03 GMT
> People shouldn't have to check the bug tracker to know if they will have a usable system after upgrading

Also, considering, as in the link I left earlier in this thread, it is obviously NOT Arch's fault. This is upstream. Considering Arch's track record, I think a comment like that is totally unwarranted. It is very rare when there are bugs in Arch, at least in my own experience, so am patient when things arise.
Comment by Jonathan Liu (net147) - Thursday, 13 January 2011, 07:27 GMT
udev doesn't rebuild kernel initcpio images when it is installed. The initcpio images contain a copy of udev.
I suspect in Kim's case, udev was updated on Dec. 16th but the initcpio image still had the previous version of udev.
After she updated the kernel, the initcpio image was rebuilt and udev in the initcpio image was updated to the latest version (which crashes on boot).
Comment by Semen Soldatov (simplexe) - Thursday, 13 January 2011, 10:33 GMT
i have some bug
Comment by Phil Collins (murr) - Thursday, 13 January 2011, 16:33 GMT
I also have this problem. Initially it happened only when inserting or ejecting a CD but then - probably after a kernel update - it started to happen on booting as well. The fallback image was no better but repeated resets would eventually succeed. Downgrading to udev 164 cures the problem. Another machine with different (and newer) hardware appears to be unaffected.
Comment by Ricardo A. Moura Rubio (ricardomoura) - Thursday, 13 January 2011, 17:55 GMT
I can confirm this. Happening with LTS kernel too.
Comment by Thomas Hagen (hagt) - Thursday, 13 January 2011, 23:09 GMT
Same happens to me. At first the oops only appeared when openening the cd-drive. Since the last kernel update the panic also happens during boot when udev is started. Fallback image works..
Comment by Kim (dedanna1029) - Friday, 14 January 2011, 14:18 GMT
"She", Jonathan, "she". Thanks.
Comment by Thomas Hagen (hagt) - Friday, 14 January 2011, 16:06 GMT
Same happens to me. At first the oops only appeared when openening the cd-drive. Since the last kernel update the panic also happens during boot when udev is started. Fallback image works..
Comment by Kim (dedanna1029) - Saturday, 15 January 2011, 01:08 GMT
There is an ongoing thread on this at the Arch forum...

https://bbs.archlinux.org/viewtopic.php?id=111197

"Keep in mind that this is not a udev bug. It's a kernel bug that udev165 tickles." https://bbs.archlinux.org/viewtopic.php?pid=878331#p878331 - just like I thought.

"I just realized that running the kernel with 'edd=off' seems to allow me to boot now. Only tried it a few times but I was getting 90% boot failure prior to this and I haven't not been able to boot yet. Might help with some?!?!?!" - https://bbs.archlinux.org/viewtopic.php?pid=878334#p878334 - this last suggestion here does NOT work for me.

Thanks.
Comment by Frank Verlinden (butterfly) - Saturday, 15 January 2011, 23:56 GMT
Hi,

I have this problem occurring on at least three archlinux systems. And adding edd=off on the kernel line does not help.
The problem occurs at boot time; the fallback image does not make any difference. The messages you see differ from system to system and sometimes from boot to boot. My Samsung NC10 wouldn't even boot after 5 trials. I could not reproduce the problem on an x64 system, thusfar it only occurred on i686 systems. It may also occur on a system booted from an external usb hdd.

Files added:
img_9972.jpg: samsung NC10 (only system using grub 2, others use grub 1, always exactly the same messages)
img_9974.jpg: P4P800-X first boot after upgrade to kernel 2.6.36.3-1 (but it happened before)
img_9975.jpg: P4P800-X second boot (third time it booted correctly)
img_9976.jpg: PX845PE
img_9977.jpg: PX845PE with edd=off
img_9979.jpg: PX845PE using fallback image
img_9980.jpg: PX845PE booted from external usb hdd
Comment by David Spicer (azleifel) - Monday, 17 January 2011, 20:04 GMT
I've got all SATA drives, so no kernel panics but pauses during boot while the problematic udev code provoked the kernel bug:
Jan 15 07:38:33 localhost kernel: ata3.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
Jan 15 07:38:33 localhost kernel: ata3.00: failed command: IDENTIFY PACKET DEVICE
Jan 15 07:38:33 localhost kernel: ata3.00: cmd a1/00:01:00:00:00/00:00:00:00:00/00 tag 0 pio 512 in
Jan 15 07:38:33 localhost kernel: res 40/00:03:00:00:00/00:00:00:00:00/a0 Emask 0x4 (timeout)
Jan 15 07:38:33 localhost kernel: ata3.00: status: { DRDY }
Jan 15 07:38:33 localhost kernel: ata3: hard resetting link
Jan 15 07:38:33 localhost kernel: ata3: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
Jan 15 07:38:33 localhost kernel: ata3.00: configured for UDMA/100
Jan 15 07:38:33 localhost kernel: ata3: EH complete

First workaround that worked for me was the panic.patch posted here by DaNiMoTh. I've now gone the Slackware -current route and rebuilt udev with the commit that's causing all the problems reverted.
Comment by solsTiCe (zebul666) - Tuesday, 18 January 2011, 13:28 GMT
I got kernel panic when (cold) booting with a dvd in the dvd player/tray. It always works the second time. which is also a "cold" boot because I have not a reboot button on my pc.

a picture of the kernel panic http://imgur.com/26tFP
Comment by jozef riha (jose1711) - Tuesday, 18 January 2011, 20:01 GMT
confirming this on dell d600 and my wife's desktop
Comment by Christian Galander (igprolin) - Tuesday, 18 January 2011, 21:30 GMT
I can confirm it on an ibm thinkpad t43 (kms enabled) and on an pc with an nvidia graphic card (kms disabled).
Comment by Kim (dedanna1029) - Tuesday, 18 January 2011, 22:26 GMT
Our forum admin swears that edd=off works, but that people have to not have their usb drives hotplugged. I've tried several times with nothing hotplugged, no cd or dvd in my dvd drive, and it's still an epic fail.

http://bjoernvold.com/forum/viewtopic.php?p=6938#p6938

I think I'm going to downgrade udev and the kernel both. The problem with that being, that I'll probably screw it up, not knowing much about mkinitcpio, and always afraid to try it even when I get a walkthrough.
Comment by akuschki (akuschki) - Tuesday, 18 January 2011, 22:57 GMT
I can confirm this on a Thinkpad T42 with Radeon graphics (kms enabled)
Comment by Pawel Stolowski (stolowski) - Tuesday, 18 January 2011, 23:35 GMT
I can confirm this on Thinkpad T61 with NVidia graphics.
Comment by Loaden YC (loaden) - Tuesday, 18 January 2011, 23:59 GMT
I can confirm this on Intel_915 board.
Comment by Börje Holmberg (linfan) - Wednesday, 19 January 2011, 00:07 GMT
For me this only occurs with amd64 running 32 bits. It is a nvidia mobo.
On my two amd64, running 64 bits, this does not happen.
Comment by Jem Orgun (jeorgun) - Wednesday, 19 January 2011, 00:18 GMT
I can confirm this on Thinkpad T60, regardless of dvds/hotplugging.
Comment by Olaf (Haderlump) - Wednesday, 19 January 2011, 06:47 GMT
For me it happens also only on my 32-bit installation, not on the 64-bit box.
Comment by Kim (dedanna1029) - Wednesday, 19 January 2011, 14:31 GMT
@Tobias, re: "Have you tried the following:
udevadm info --convert-db
and rebuild your initramfs by running mkinitcpio -p kernel26"

I've noted from the Arch forum, that patches, etc. have been messing up Fallback as well as the main kernel for a good fair few. I'm afraid to do anything at this point, as Fallback is the only way I can get into my system. How safe would this be to run?

Edit: Just tried it anyway, and guess what? It worked. This is without my external drives hotplugged, and with no cd/dvd in my dvd drive. I also have several more reboots to go to confirm it's a keeper.

I should say, this is exactly what I did. It's not like I just whipped out a root terminal while in the desktop, and did it. I didn't. There was a bit more to it. https://bbs.archlinux.org/viewtopic.php?pid=880526#p880526

You are a god. Thank you.
Comment by Börje Holmberg (linfan) - Wednesday, 19 January 2011, 16:16 GMT
When the new stock kernel26 came the problems started. I was then told that it is udev, so I donwgraded udev and did mkinitcpio and mkinitcpio -G IMAGE, but the problem persists. Maybe there is need to downgrade the kernel as well. Seems "old" udev and "new" kernel26 don't do very well together.
Comment by Kim (dedanna1029) - Wednesday, 19 January 2011, 17:02 GMT
What happens if you try as in the previous post to yours here? In that way?

umount all external usb drives, make sure no other hard drives are mounted, make sure there is no cd or dvd in the cd/dvd drives, then:
Log out
Hit Ctl+Alt+F1
Log into root
Then, do:
init 3
udevadm info --convert-db
mkinitcpio -p kernel26
Then, reboot.

It's simple to do, and worked for me where nothing else did.
Comment by Kim (dedanna1029) - Wednesday, 19 January 2011, 17:09 GMT
Just rebooted with one of my external usb drives hotplugged. Still works a charm. :)
Comment by David Kremer (dkremer) - Thursday, 20 January 2011, 09:44 GMT
I'm currently installing the 2.6.38-rc1.

Would it be fixed by installing the last kernel tree ? I wonder if there is any bug report on it ? Also, I cannot find the last commit of Tejun Heo in the kernel tree, so I'm not sure the bug is corrected.

http://comments.gmane.org/gmane.linux.hotplug.devel/16438
http://www.linuxquestions.org/questions/slackware-14/current-randomly-timed-kernel-oops-on-bootup-of-two-test-boxen-852843/

Comment by Jerry Stillman (meganox) - Thursday, 20 January 2011, 11:25 GMT

I gather from Tejun's comments that he doesn't actually agree that he broke it. Doesn't seem to me they've identified the issue yet so I wouldn't be optimistic about seeing a fix yet. I'm hobbling along with a simple patch to udev to remove the offending call but I'm hoping for a real fix soon as it's time to update the system again, haven't done it in over a week now (eek).
Comment by David Kremer (dkremer) - Thursday, 20 January 2011, 11:57 GMT
Hello, on the first link the comment by GK Hartmann was to said that user space code shouldn't panic the kernel. So…
Comment by Brad Price (beerad) - Thursday, 20 January 2011, 18:34 GMT
Tejun actually started a new thread on the linux-hotplug list. He posted a final version of the patch this morning at http://permalink.gmane.org/gmane.linux.hotplug.devel/16492 I've been using it for a couple of days and it's been flawless. If you want to test it, I would get the 2.6.37 source tarball from kernel.org, and apply the patch to that. I doubt if it's in -rc1.
Comment by Kim (dedanna1029) - Thursday, 20 January 2011, 20:20 GMT
How exactly do we apply this patch? I'm sorry, I've never done any kernel patching before, and would like to try it, if it's not just for the testing version .37 and above. Thanks.

I as well haven't updated since I got the bug.

uname -r
2.6.36-ARCH
Comment by Brad Price (beerad) - Thursday, 20 January 2011, 21:18 GMT
To apply this kernel patch, you need to build a "custom" kernel. If you've never built one before the first step would be to follow this wiki: https://wiki.archlinux.org/index.php/Kernel_Compilation_From_Source and install and boot the new kernel. Once you have that working, applying the patch and rebuilding is not a big deal. Be aware that if you use the default Arch .config file it might take a couple of hours cpu time to compile the kernel and modules. If you do decide to do this and need some help, post over in the forum thread and I'll give you a hand. I would strongly recommend that you give the new kernel a different name, and keep the current Arch kernel and initrd in /boot and Grub, so if your new kernel won't boot, you can just switch to the Arch kernel in grub.
Comment by Kim (dedanna1029) - Thursday, 20 January 2011, 22:05 GMT
Okay, thanks so much. I think I'd have to do this late at night my time, then. Ugh.

How do I do the other part of this, "give the new kernel a different name, and keep the current Arch kernel and initrd in /boot and Grub" - when a new kernel's installed, the old one uninstalls in regular updates. I guess I don't have the hang of keeping the old kernel in Arch yet.

Thanks so much.
Comment by Brad Price (beerad) - Thursday, 20 January 2011, 23:26 GMT
I use something similar to method 1 in the wiki, and do everything manually. That way when I copy the files into /boot I can choose the filename. I don't build in /usr/src; I unpack the tarball in /home/brad/linux, so it creates the source tree as /home/brad/linux/linux-2.6.37. Then I am root only when I do the "make modules_install" and work in /boot. Also, if you have multiple cores or multiple cpus, do a "make -jN" where N = (number_of_cpus * number_of_cores_per_cpu) + 1. There are two Arch wikis on this, too -- the other one is different.

There's lots of info on the net on building kernels. O'Reilly has an online book by Greg Kroah Hartman (Linux Kernel in a Nutshell?) that has a lot of good info in it.
Comment by Ricardo A. Moura Rubio (ricardomoura) - Friday, 21 January 2011, 15:51 GMT
Sad news:

It's still happening sometimes on boot, although rarely. Even with udev 164-3. I'm using latest LTS kernel. haven't tried yet with the normal kernel.
Comment by David Kremer (dkremer) - Friday, 21 January 2011, 16:19 GMT
Sounds to me like the kernel is crappy. :( Does anyone know what is actually the problem ?
Comment by rubble (rubble) - Friday, 21 January 2011, 19:06 GMT
I've had no problems booting since running "udevadm info --convert-db" and rebuilding my initramfs. It was almost impossible to boot from the main image beforehand.

In-case it adds anything, my setup includes...
1xSATA HDD
2xPATA DVD drives (Liteon & Samsung)
Comment by Brad Price (beerad) - Saturday, 22 January 2011, 01:03 GMT
I'll try this once more, inline this time. Maybe the link didn't stand out enough??
http://permalink.gmane.org/gmane.linux.hotplug.devel/16492

From: Tejun Heo <htejun@xxxxxxxx>
Subject:[PATCH #upstream-fixes] libata: set queue DMA alignment to sector size for ATAPI too
Date: Thu, 20 Jan 2011 13:59:06 +0100 (01/20/2011 05:59:06 AM)

ata_pio_sectors() expects buffer for each sector to be contained in a
single page; otherwise, it ends up overrunning the first page. This
is achieved by setting queue DMA alignment. If sector_size is smaller
than PAGE_SIZE and all buffers are sector_size aligned, buffer for
each sector is always contained in a single page.

This wasn't applied to ATAPI devices but IDENTIFY_PACKET is executed
as ATA_PROT_PIO and thus uses ata_pio_sectors(). Newer versions of
udev issue IDENTIFY_PACKET with unaligned buffer triggering the
problem and causing oops.

This patch fixes the problem by setting sdev->sector_size to
ATA_SECT_SIZE on ATATPI devices and always setting DMA alignment to
sector_size. While at it, add a warning for the unlikely but still
possible scenario where sector_size is larger than PAGE_SIZE, in which
case the alignment wouldn't be enough.
Comment by Jerry Stillman (meganox) - Saturday, 22 January 2011, 01:21 GMT
Brad, thanks for the info, I think the problem is people are looking for a fix for their current kernel rather than patching an unknown kernel. I'm going to have a good look at the sources tomorrow and see if I can backport the patch. Apologies if I missed something, been trying to follow all this on my phone.
Comment by Brad Price (beerad) - Saturday, 22 January 2011, 01:54 GMT
Thanks for the reply, Jerry. I though it might have been getting overlooked. I'll stand down now, and good luck with the backport.
Comment by birney titus (patanjali) - Saturday, 22 January 2011, 02:59 GMT
Just wanted to confirm that I too have this problem with the long-term-support kernel, though it's only about one boot out of every 10.
Name : kernel26-lts
Version : 2.6.32.28-1
Name : udev
Version : 165-1


Comment by Kim (dedanna1029) - Saturday, 22 January 2011, 07:04 GMT
This is my problem with this. I know little enough about building kernels, and am afraid that is a job that I'd definitely mess up. We really don't want to patch an unknown kernel, we'd just like a fix for what we have, which doesn't seem difficult. We need to fix what we have. I don't want to have to spend 2 hours (or possibly more) of what little time I have, trying to do something that I think in the end I will mess up, when the fix seems simple. If there's a patch for the kernel we have, why can't we just get that into the repos from upstream or wherever, so when we do updates, our kernels fix normally?

I'm not a developer nor packager myself, nor am I running Arch testing, and really do not want to attempt the above. I'm only someone who helps with bugs, and does what the devels tell her to do in order to see what works, and what doesn't. I'd much rather get what I can from updates. I took a simple route to get working again for now, and it worked - another user has reported the same. Can we move on with a fix for the repos? If you'll notice, right now, I'm only on this kernel:

pacman -Q kernel26
kernel26 2.6.36.3-1

I am nowhere near ready in all honesty to try something this deep in the unknown. I've been running Arch solidly yes, but for just under a year, and am still in the learning stages, I think, with Arch. I have general knowledge of Linux, yes, ran Cooker for Mandriva for a couple of years, and contributed to many bugs for that. That's the extent of what knowledge I have, having run Linux in general for over 10 years now. I have installed and set up Arch for many many people, including myself, and to include businesses, but after that, with others, I hand them the wiki and let them go for it.

Thanks.
Comment by Jonathan Liu (net147) - Saturday, 22 January 2011, 07:58 GMT
Here is a backport of the patch to Linux 2.6.32.
It should also apply cleanly to Linux 2.6.36.

Does not compile yet.
Comment by Jonathan Liu (net147) - Saturday, 22 January 2011, 09:38 GMT
Here is fixed version of patch for Linux 2.6.32 and 2.6.36. It should compile now.
Note: I have modified the patch so that it does not require commit http://git.kernel.org/?p=linux/kernel/git/stable/linux-2.6.37.y.git;a=commitdiff;h=295124dce4ddfd40b1f12d3ffd2779673e87c701 (which adds support for >512 byte sector sizes in Linux 2.6.37 and later).

I've also attached the original patch for Linux 2.6.37.
Comment by Jonathan Liu (net147) - Saturday, 22 January 2011, 12:22 GMT
@tpowa
The kernel patch fixes the issue. Would you be able to add them to the Arch Linux kernel packages?
libata-alignment-2.6.32.patch for core/kernel26 2.6.36.3, core/kernel26-lts 2.6.32.28
libata-alignment.patch for testing/kernel26 2.6.37

Loading...