FS#11141 - Kernel oops on Archlinux guest in Virtualbox 1.6.4
Attached to Project:
Arch Linux
Opened by Mathias Burén (fackamato) - Thursday, 07 August 2008, 10:07 GMT
Last edited by Tobias Powalowski (tpowa) - Saturday, 04 October 2008, 13:07 GMT
Opened by Mathias Burén (fackamato) - Thursday, 07 August 2008, 10:07 GMT
Last edited by Tobias Powalowski (tpowa) - Saturday, 04 October 2008, 13:07 GMT
|
Details
I think this is a problem with the Archlinux kernel,
therefore I'm posting this bug here. It can also be seen
at:
http://forums.virtualbox.org/viewtopic.php?t=8499
Description: Kernel oops on Archlinux guest in Virtualbox 1.6.4. I used the latest ISO install CD, installed the system, enabled the testing repo, pacman -Syu, reboot. Now I did pacman -S xorg kde, and it begins to download packages. Then all of a sudden (after ~6 minutes perhaps?) I get a kernel oops. kernel 2.6.26-ARCH, boot options were rootflags=data=writeback noapic nosmp acpi=off Picture attached. |
This task depends upon
At the moment my guest oopses every reboot, regardly what boot parameter i use, also with fallback image.
I will boot the guest from cd and try to rebuild the initrd again.
But this seems something happend only in virtualbox (or virtual machines?). On my Laptop with 2.6.26 from testing i have never had such a thing. Also on forums etc. is nothing similar from users on "real" machines.
I have a trace in /var/log/everything.log from the oops in the guest, also a screenshot from boot trap. If this is usefull for you i could post/attach it.
Doing the same things as before (makepkg virtualbox-modules) leads to an oops again, and to an unbootable guest again.
My archlinux guest also crashed in virtualbox with kernel 2.6.26.
Disabling acpi helped to boot most of the times, but the system was still highly unstable and *always* crashed eventually.
Come to think of it, it also happened very often while some IO actions were happening, which confirms what Mathias said.
I rebuilt both 2.6.26 and later 2.6.26.3 with a minimal config, and it has been working perfectly. I don't know which config changes were relevant.
As Gerbra, the crash during boot happened directly after Freeing SMP alternatives, so I disabled SMP in my minimal config. Maybe this did it?
If anyone could confirm this, it would be interesting. I am not able to do it at the moment.
I found one with some information on virtualbox bug tracker :
http://www.virtualbox.org/ticket/1875
virtualbox was not the OSE version. There we haven't any oops or crash with 2.6.26. Perhaps some tests could help to isolate the problem. Have anyone this problem on a non OSE version? Have anyone this problem on an x86_64 (OSE or full version)?
On my few test i mean that enabling VT-x could help against the oops during boot sometimes. But not against the heavy IO oops.
Some "facts":
Enabling VT-x or disabling acpi in guest settings (and using noacpi as guest kernel parameter) always goes over the early oops after "Freeing SMP"
The oops could here always reproduced during the installer in package installation (heavy disk IO). Oops goes to dmesg, package installation goes further but with FAILED when finished. Seems that nothing got written to HD after the first trap.
I have tested several filessystems.
If i shutdown the guest (HOST+Q->shutdown virtual machine) i often get an error dialog from virtualbox (logfile and screenshot)
Currently i could not get these logfile (i have serveral vbox logs with > 50MB) and it seems that it's impossible to seatch therein the exact place when the fault happens. If i could reporduce this virtualbox error dialog i will isolate it.
I attach the guest dmesg and messages.log, maybe this is usefull.
Have anyone this problem on a non OSE version? Have anyone this problem on an x86_64 (OSE or full version)?"
Yes, I'm experiencing this problem with the non-OSE on x86_64.
It's a very weird problem - intermittent. Sometimes it boots; other times it fails with the "BUG: unable to handle kernel" message. And other times it boots, runs successfully for a while, and then segfaults on something later on (e.g., most recently when I was building the virtualbox-ose-additions package). And yes, when it segfaults, the trace does seem to be in some storage/file system code. Weird.
It builds the whole source, and only keeps like 1% of it.
Instead, you can just build that 1%, and have a build 100 times faster for the same result :P
(if you are wondering, yes, I sent these to the maintainer (Bash) 10 days ago. He was in holiday at that time)
virtualbox-ose-additions-modu... (1.2 KiB)
Bash actually updated them a few days ago, but I confused him by sending him several version of virtualbox-ose-additions.
He fixed it and it all looks great now, so you can simply use the packages from community and the PKGBUILDs from abs.
"I have added noreplace-paravirt as a boot param..."
I think this could not be a solution. I tested it with the multi-arch iso (2.6.26) and with this parameter it's extremly slow and the iso doesn't boot (initrd don't find the cdrom after udev hook).
The "problem" by all kernel-parameters and virtualbox-settings is IMHO: there was never a problem with kernel < 2.6.26. No need for tricks. Running the same virtualbox version on my host these errors only occurs when the guests switch to 2.6.26 (on arch guests by pacman update, or for the Install-ISO using the 2008.08-froscon-iso with 2.6.26 as kernel).
And when this error raise on vbox-ose, vbox-sun, i686 and on x86_64 (seen from the host side), then in my opinion it's a kernel bug (or a modification on which Sun must modify their virtualbox code). Qemu for ex. has not this problem.
http://lkml.org/lkml/2008/8/20/359
I like lkml.org but I had one bad experience about tracking all answers in a thread. marc or gmane worked better :
http://marc.info/?t=121926107200003&r=1&w=2
http://thread.gmane.org/gmane.linux.kernel/724038/focus=725005
So there has been quite a few exchanges, but unfortunately no real progress yet.
We do further testing (it's releated to 2.6.26 irq state handling, as far as i understand cause i'm not an developer. And my C
expirience is much uglier than my english..<g>).
Also these guys must resolve if changes maybe have side effects to "real" architectures. Linux *only* on virtualbox is not a goal...
I think i could post tomorrow a patch for Arch stock kernel that some of you maybe would like to test. You need to build a new
kernel for your guests.
By the way, being a C developer is one thing, learning and understanding the kernel internals is another :)
Finally, it seems like the last patch you had do test did not work. At this point, I don't think you need to do wider testing, I think you rather have to wait for other ideas/proposals.
Anyway, thanks again for all your effort and keep up the good testing, it will hopefully lead somewhere :)
Short testing here gave me significant betterment (mostly for the early oops after freeing smp..). But i could reproduce the kernel panic under heavy disk io. I'll test more in the later evening...
My vanilla 2.6.26.3 with custom config (attached) still works perfectly fine under any situations.
Any chance you could post a kernel26 i686 package built using your custom config? I've only got x86_64 boxes anymore and so can't build an i686 package. (I could obviously try to build it inside a vbox guest, but that would most likely crash with a kernel oops due to this very problem.)
It is still there :
from http://wiki.archlinux.org/index.php/Downgrade_packages
http://ftp.tu-chemnitz.de/pub/linux/sunsite.unc-mirror/distributions/archlinux/core/os/i686/kernel26-2.6.25.11-1-i686.pkg.tar.gz
Well, that ought to do the trick then, until this is fixed. Thanks much for the tip.
The good news: It is fixed in VirtualBox 2.0.2. Last comment here: http://www.virtualbox.org/ticket/1875
So maybe we could close this in the next days?