Please read this before reporting a bug:
https://wiki.archlinux.org/title/Bug_reporting_guidelines
Do NOT report bugs when a package is just outdated, or it is in the AUR. Use the 'flag out of date' link on the package page, or the Mailing List.
REPEAT: Do NOT report bugs for outdated packages!
https://wiki.archlinux.org/title/Bug_reporting_guidelines
Do NOT report bugs when a package is just outdated, or it is in the AUR. Use the 'flag out of date' link on the package page, or the Mailing List.
REPEAT: Do NOT report bugs for outdated packages!
FS#55044 - [qemu] [jemalloc] ceph/rbd backed libvirt VMs experience I/O hang
Attached to Project:
Arch Linux
Opened by Jamin Collins (jamincollins) - Saturday, 05 August 2017, 20:49 GMT
Last edited by Sven-Hendrik Haase (Svenstaro) - Sunday, 04 July 2021, 21:16 GMT
Opened by Jamin Collins (jamincollins) - Saturday, 05 August 2017, 20:49 GMT
Last edited by Sven-Hendrik Haase (Svenstaro) - Sunday, 04 July 2021, 21:16 GMT
|
DetailsDescription:
VMs using ceph/rbd backed volumes experience a complete I/O stall when attempting to access the ceph/rbd volume Additional info: * ceph-10.2.5-2 Steps to reproduce: * configure libvirt for usage with ceph[1] * configure at least one VM drive to be backed by ceph/rbd * boot VM * attempt to access ceph/rbd backed volume - dd if=/dev/zero of=${rbd_backed_device} bs=1M oflag=sync status=progress Background: I recently found that I could not restart any of my ceph/rbd backed VMs. They all appeared to hang during boot. Initially, I thought this was due to migrating them to a freshly installed host and possibly missing some configuration step. However, attempting to boot them on an existing host revealed similar hangs during boot. Eventually, I attempted to boot one of the VMs on a third host and found that it worked. Looking for differences between the hosts I found that the working host was running my development aur/ceph-git package while the failing hosts were all running the extras/ceph 10.2.5-2 package. To test this theory, I installed my aur/ceph-git package on one of the failing hosts. After installation, VMs were able to start successfully. Reverting to the extras/ceph package resulted in the same VM boot hangs. Using the extras/ceph PKGBUILD file as a template, I've successfully compiled 10.2.6 and 10.2.7. Both versions exhibit the same I/O hang behavior. In all cases, replacing the extras/ceph package with the aur/ceph-git build resolves the issue. The specific aur/ceph-git build used is: ceph-git-1:12.1.0.1018.g171104cb93-1-x86_64.pkg.tar.xz I'll continue digging into the differences between the two packages and update this report if I'm successful in finding a solution. [1] - http://docs.ceph.com/docs/master/rbd/libvirt/ |
This task depends upon
I say this for a few reasons:
1) this use case (backing libvirt vms with rbd devices) is common for ceph and the 10.2.5 release has been out since December 2016, a bug like this would have been reported
2) Ubuntu's 16.04 LTS release has ceph 10.2.7, I've installed and configured a Ubuntu based virt host, (using 10.2.7) it does not experience this IO hang
- Ubuntu with ceph 10.2.7 where it does not hang
- Arch with ceph-git-1:12.1.0.1018.g171104cb93-1 where it does not hang
However, these logs are each over 15M compressed.
This report should probably be reassigned to the qemu package.
And I wonder why ceph-git didn't have that problem, if it is a qemu issue.
I suspect that quite a bit has changed architecturally between the ceph LTS and development branches. I know they use an entirely different build system.
My CentOS7 VM has being completely broken by sudden I/O stop during system update. I tried to setup a new CentOS VM using virt-manager, but after some minutes the VM hangs completely in the installation, obviously due to some I/O problems. The qemu instance can then only be killed a hard way. This is absolutely reproducible on my machine here. Just grab the latest ISO from CentOS and try to install. I'll try this later on another machine, I wonder if I'll run into the same problem there.
[1] https://github.com/archlinux/svntogit-packages/commit/9f34f2cf2afcc81bd32260f272fb4fa1fd4fa68c
Jamin, could you please confirm if the issue still exists?
ceph 14.2.8-1
ceph-libs 14.2.8-1
qemu 5.0.0-7
qemu-block-rbd 5.0.0-7