FS#29692 - [linux] 3.3.4-1-ARCH crashes as KVM kernel

Attached to Project: Arch Linux
Opened by Tobias Hunger (hunger) - Tuesday, 01 May 2012, 12:28 GMT
Last edited by Tobias Powalowski (tpowa) - Sunday, 15 July 2012, 09:14 GMT
Task Type Bug Report
Category Packages: Core
Status Closed
Assigned To Tobias Powalowski (tpowa)
Thomas Bächler (brain0)
Architecture x86_64
Severity Medium
Priority Normal
Reported Version
Due in Version Undecided
Due Date Undecided
Percent Complete 100%
Votes 0
Private No

Details

Description:
After upgrading my KVM virtual machines to linux 3.3.4-1-ARCH they keep crashing all the time. This seems to be random,
sometimes they do not even boot before crashing, at other times they crash a while after boot. None of them survives
longer than 10min or so.

All these machines were very stable before the last upgrade (even though I might have been running a somewhat older
kernel (3.2.x) since I did not reboot them in a while:-)

Additional info:
* uname -a gives: Linux box 3.3.4-1-ARCH #1 SMP PREEMPT Sat Apr 28 00:21:22 CEST 2012 x86_64 AMD Athlon(tm) 64 X2 Dual Core Processor 6000+ AuthenticAMD GNU/Linux

Steps to reproduce:
1. Boot
2. wait
This task depends upon

Closed by  Tobias Powalowski (tpowa)
Sunday, 15 July 2012, 09:14 GMT
Reason for closing:  Fixed
Comment by Dave Reisner (falconindy) - Tuesday, 01 May 2012, 12:40 GMT
Cannot reproduce. I've got more than a half dozen VMs running fine on 3.3.4. Perhaps you can post an actual error? Simply "crash" is way too vague for a bug report you're marking as critical.
Comment by Tobias Hunger (hunger) - Tuesday, 01 May 2012, 12:47 GMT
A screenshot of the issue.

The kernel works fine on real hardware.
Comment by Tobias Hunger (hunger) - Tuesday, 01 May 2012, 12:56 GMT
Was just able to salvage parts of /var/log/messages: There is lots of these in it:

May 1 14:42:39 boba kernel: [ 108.472371] Modules linked in: des_generic ecb md4 md5 hmac nls_utf8 cifs fscache ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter ip6_tables xt_tcpudp ipt_REJECT nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack iptable_filter ip_tables x_tables evdev serio_raw psmouse pcspkr virtio_balloon i2c_piix4 virtio_net i2c_core sr_mod cdrom floppy processor button ext4 crc16 jbd2 mbcache uhci_hcd usbcore usb_common ata_piix libata scsi_mod virtio_blk virtio_pci
May 1 14:42:39 boba kernel: [ 108.472403] Pid: 766, comm: git Tainted: G B 3.3.4-1-ARCH #1
May 1 14:42:39 boba kernel: [ 108.472405] Call Trace:
May 1 14:42:39 boba kernel: [ 108.472413] [<c01e3167>] bad_page+0xa7/0xf0
May 1 14:42:39 boba kernel: [ 108.472416] [<c01e377d>] get_page_from_freelist+0x47d/0x520
May 1 14:42:39 boba kernel: [ 108.472419] [<c01e3f31>] __alloc_pages_nodemask+0xf1/0x700
May 1 14:42:39 boba kernel: [ 108.472423] [<c0219372>] ? kmem_cache_alloc_trace+0x102/0x110
May 1 14:42:39 boba kernel: [ 108.472432] [<e0e94ccc>] ? cifs_fscache_set_inode_cookie+0x2c/0x120 [cifs]
May 1 14:42:39 boba kernel: [ 108.472435] [<c01e7150>] __do_page_cache_readahead+0xe0/0x210
May 1 14:42:39 boba kernel: [ 108.472438] [<c01e7506>] ra_submit+0x26/0x30
May 1 14:42:39 boba kernel: [ 108.472440] [<c01e7647>] ondemand_readahead+0x137/0x230
May 1 14:42:39 boba kernel: [ 108.472446] [<c029ac08>] ? security_dentry_open+0x78/0x80
May 1 14:42:39 boba kernel: [ 108.472449] [<c01e781b>] page_cache_sync_readahead+0x3b/0x60
May 1 14:42:39 boba kernel: [ 108.472453] [<c01def5a>] generic_file_aio_read+0x4aa/0x710
May 1 14:42:39 boba kernel: [ 108.472456] [<c02445fd>] ? mntput+0x1d/0x30
May 1 14:42:39 boba kernel: [ 108.472459] [<c023796b>] ? path_openat+0xcb/0x350
May 1 14:42:39 boba kernel: [ 108.472464] [<c022947f>] do_sync_read+0xaf/0xf0
May 1 14:42:39 boba kernel: [ 108.472466] [<c0237d01>] ? do_filp_open+0x31/0x80
May 1 14:42:39 boba kernel: [ 108.472469] [<c029aa14>] ? security_file_permission+0x94/0xb0
May 1 14:42:39 boba kernel: [ 108.472472] [<c0229b41>] ? rw_verify_area+0x61/0x120
May 1 14:42:39 boba kernel: [ 108.472474] [<c023406b>] ? putname+0x2b/0x40
May 1 14:42:39 boba kernel: [ 108.472477] [<c02293d0>] ? do_sync_write+0xf0/0xf0
May 1 14:42:39 boba kernel: [ 108.472479] [<c022a035>] vfs_read+0x85/0x160
May 1 14:42:39 boba kernel: [ 108.472482] [<c02293d0>] ? do_sync_write+0xf0/0xf0
May 1 14:42:39 boba kernel: [ 108.472484] [<c022a14d>] sys_read+0x3d/0x80
May 1 14:42:39 boba kernel: [ 108.472488] [<c04ae6df>] sysenter_do_call+0x12/0x28

Sorry for the vague first report, I never had to report a kernel crash before and am still looking up instructions on what to do:-)
Comment by Tobias Hunger (hunger) - Tuesday, 01 May 2012, 13:18 GMT
Just quadrupled the RAM assigned to each machine and now all of them are way more stable.

I guess with such a workaround this is no longer critical... how can I reduce the severity of the issue?
Comment by Jelle van der Waa (jelly) - Tuesday, 01 May 2012, 13:19 GMT
Give some info about the command line parameters you're using and the loaded modules
Comment by Dave Reisner (falconindy) - Tuesday, 01 May 2012, 13:22 GMT
Yes that seemed obvious from the kswapd crash. Does disabling transparent huge pages help?

echo never >/sys/kernel/mm/transparent_hugepage/enabled
Comment by Tobias Hunger (hunger) - Tuesday, 01 May 2012, 18:30 GMT
Just quadrupled the RAM assigned to each machine and now all of them are way more stable.

I guess with such a workaround this is no longer critical... how can I reduce the severity of the issue?
Comment by Dave Reisner (falconindy) - Tuesday, 01 May 2012, 18:34 GMT
Yes, this is the third time you posted that. I already reduced the severity.
Comment by Tobias Hunger (hunger) - Tuesday, 01 May 2012, 19:10 GMT
# cat /proc/cmdline
BOOT_IMAGE=/boot/vmlinuz-linux root=/dev/disk/by-uuid/some-uuid ro add_efi_memmap

# cat /tmp/modules
Module Size Used by
ip6t_REJECT 2408 3
nf_conntrack_ipv6 5316 4
nf_defrag_ipv6 4913 1 nf_conntrack_ipv6
ip6table_filter 1032 1
ip6_tables 10333 1 ip6table_filter
xt_tcpudp 1843 4
ipt_REJECT 1957 3
nf_conntrack_ipv4 5347 5
nf_defrag_ipv4 1015 1 nf_conntrack_ipv4
xt_conntrack 2709 9
nf_conntrack 49000 3 xt_conntrack,nf_conntrack_ipv4,nf_conntrack_ipv6
iptable_filter 1060 1
ip_tables 9154 1 iptable_filter
x_tables 11893 8 ip_tables,iptable_filter,xt_conntrack,ipt_REJECT,xt_tcpudp,ip6_tables,ip6table_filter,ip6t_REJECT
serio_raw 3709 0
psmouse 69902 0
pcspkr 1423 0
evdev 7310 2
i2c_piix4 7148 0
virtio_balloon 3964 0
virtio_net 11190 0
i2c_core 16845 1 i2c_piix4
sr_mod 13148 0
cdrom 30504 1 sr_mod
floppy 49063 0
processor 23476 0
button 3614 0
ext4 384668 2
crc16 1091 1 ext4
jbd2 60590 1 ext4
mbcache 4345 1 ext4
uhci_hcd 19712 0
usbcore 122751 2 uhci_hcd
usb_common 622 1 usbcore
ata_piix 18616 0
virtio_blk 5541 4
libata 145775 1 ata_piix
scsi_mod 112765 2 libata,sr_mod
virtio_pci 6163 0

It is a pretty standard KVM installation with very few custom tweaks.

Disabling transparent_hugepages seems to reduce the liklyhood of a crash:
I do still get lots of that output from /var/log/messages when getting memory
pressure, but the VM stops way more rarely. It still does happen, though.
Comment by Tobias Powalowski (tpowa) - Wednesday, 02 May 2012, 19:12 GMT
should be fixed in 3.3.4-2
Comment by Tobias Hunger (hunger) - Saturday, 12 May 2012, 08:07 GMT
The situation seems to have improved a lot, but I am still seeing the issue about once a week, even in linux 3.3.5-1.
Comment by Tobias Powalowski (tpowa) - Tuesday, 12 June 2012, 06:54 GMT
Status on 3.4.x?
Comment by Tobias Hunger (hunger) - Saturday, 14 July 2012, 20:08 GMT
3.4.x seem to work fine. Thanks!

Loading...