FS#64150 - [qemu] Massive corruption of qcow2 images with qemu 4.1.0

Attached to Project: Arch Linux
Opened by Milos Buncic (psyhomb) - Wednesday, 16 October 2019, 17:42 GMT
Last edited by Anatol Pomozov (anatolik) - Monday, 11 November 2019, 19:25 GMT
Task Type Bug Report
Category Packages: Extra
Status Closed
Assigned To Anatol Pomozov (anatolik)
Architecture x86_64
Severity Critical
Priority Normal
Reported Version
Due in Version Undecided
Due Date Undecided
Percent Complete 100%
Votes 8
Private No

Details

Description:

I'm seeing massive corruption of qcow2 images with qemu 4.1.0 after a few savevm/quit/loadvm cycles.

After downgrading from 4.1.0 => 4.0.0 everything is running normal again, no corruption detected and all qcow2 images stays healthy.

Additional info:
- https://bugs.launchpad.net/qemu/+bug/1847793
- https://bugs.launchpad.net/qemu/+bug/1846427

This task depends upon

Closed by  Anatol Pomozov (anatolik)
Monday, 11 November 2019, 19:25 GMT
Reason for closing:  Fixed
Additional comments about closing:  qemu-4.1.0-5
Comment by loqs (loqs) - Wednesday, 16 October 2019, 17:59 GMT
Does reverting https://github.com/qemu/qemu/commit/69f47505ee66afaa513305de0c1895a224e52c45 resolve the issue for you as discussed in the first bug report you link?
Comment by Milos Buncic (psyhomb) - Wednesday, 16 October 2019, 18:15 GMT
Yes, everything is running just fine after downgrading to 4.0.0, but I had to reinstall guest OS.
Comment by loqs (loqs) - Wednesday, 16 October 2019, 21:36 GMT
I should have been precise. I meant reverting just that commit. PKGBUILD attached that reverts just that commit.
The PKGBUILD has been build tested only.
Comment by Adam Kürthy (adee) - Thursday, 17 October 2019, 06:56 GMT
I'm also affected. I think the package in Extra must be reverted to 4.0 until a solution is released upstream.
Comment by Milos Buncic (psyhomb) - Thursday, 17 October 2019, 12:14 GMT
>I meant reverting just that commit. PKGBUILD attached that reverts just that commit.
I haven't tried reverting just that commit, instead I decided to play safe and downgrade to 4.0.0.
Comment by Ruben Van Boxem (rubenvb) - Thursday, 17 October 2019, 13:42 GMT
I was also just hit by this. Also reverted to qemu-4.0.0-3 (and virglrenderer-0.7.0-1).

I could luckily restore my VM from a backup image.
Comment by Toolybird (Toolybird) - Wednesday, 23 October 2019, 19:08 GMT Comment by Anatol Pomozov (anatolik) - Wednesday, 23 October 2019, 21:08 GMT
Thank you for the update.

And the patchset that suppose to fix the issue is posted here https://lists.gnu.org/archive/html/qemu-block/2019-10/msg01414.html

Once the fix is merged upstream I'll pull it to [testing] repo.
Comment by Toolybird (Toolybird) - Wednesday, 30 October 2019, 22:25 GMT
Fixes were merged 5 days ago. Would be good to get them into Arch.

I personally wasn't hit by these corruptions (lucky I guess) but I added these patches to my own test build and all seems fine still.

https://git.qemu.org/?p=qemu.git;a=commit;h=944f3d5dd216fcd8cb007eddd4f82dced0a15b3d
https://git.qemu.org/?p=qemu.git;a=commit;h=5e9785505210e2477e590e61b1ab100d0ec22b01

Other corruptions are still being talked about upstream but AFAICT it mainly applies when a qcow2 image is backed by XFS.
Comment by Anatol Pomozov (anatolik) - Thursday, 31 October 2019, 16:35 GMT
Hi folks

I just applied two patches mentioned above and pushed qemu-4.1.0-3 to [testing]. Please give it a try and let me know if you see any issues with this build.
Comment by Ruben Van Boxem (rubenvb) - Friday, 01 November 2019, 16:05 GMT
I tried to use my Windows 10 VM twice with qemu-4.1.0-3 but hit corruption quite quickly and was staring at the repair boot disks startup screen quite quickly.

For me the -3 package does not fix this problem.
Comment by Anatol Pomozov (anatolik) - Wednesday, 06 November 2019, 17:59 GMT
Ruben what filesystem do you use at host? Is it XFS by the chance?
Comment by Ruben Van Boxem (rubenvb) - Wednesday, 06 November 2019, 20:02 GMT
Plain ol' ext4 I fear. The system in question is a Dell XPS 3980 if it makes any difference. I don't remember if I was running the ltd or regular kernel when I tested, not that it really matters I guess.
Comment by Valery (v50110) - Saturday, 09 November 2019, 17:09 GMT
The problem is still there in qemu-4.1.0-4 which is just relised. In my case command "qemu-img check" shows many "ERROR cluster XX refcount=YY reference=ZZ" on different qcow2 images created by qemu-img from qemu-4.0.0-3. Downgrade to this version shows all images are ok. I. e. it is not obligatory to do few savevm/quit/loadvm, it is just enough to create an image of disk with qemu-img from qemu-4.0.0-3, and then to check it with qemu-img of version 4.1.0-4
Comment by Toolybird (Toolybird) - Saturday, 09 November 2019, 20:08 GMT
Valery, are your images compressed qcow2 images? Your report sounds similar to a bug I discovered and reported upstream which has now been fixed and will be included in the next release:

https://bugs.launchpad.net/qemu/+bug/1850000

Comment by Valery (v50110) - Saturday, 09 November 2019, 22:23 GMT
Yes, I used compress option too! Looks like your report describes the same problem I encountered. Thank you for your upstream report! Waiting for new release)
Comment by Ruben Van Boxem (rubenvb) - Saturday, 09 November 2019, 22:30 GMT
My images were also compressed so let's hope all this has then been remedied in the upcoming release :)
Comment by Toolybird (Toolybird) - Saturday, 09 November 2019, 22:39 GMT
If Anatol would like to apply the patch to current Arch then here is the upstream commit:

https://git.qemu.org/?p=qemu.git;a=commit;h=24552feb6ae2f615b76c2b95394af43901f75046
Comment by Anatol Pomozov (anatolik) - Sunday, 10 November 2019, 20:45 GMT
Thank you folks for debugging this issue with large compressed images. qemu-4.1.0-5 with the mentioned upstream fix just landed [testing] repo. Please give it a try and let me know if you still see the issue.
Comment by Valery (v50110) - Sunday, 10 November 2019, 22:04 GMT
Thanks, Anatol, looks like the issue is fixed. I checked some of my compressed images with qemu-img from new qemu-4.1.0-5 and all they looks fine.
Comment by Ruben Van Boxem (rubenvb) - Monday, 11 November 2019, 11:40 GMT
I am getting a lot of
> qcow2_free_clusters failed: Invalid argument
in my log.
Reading the discussion on the bugs and commits, this seems relevant.
Haven't had any corruption though with qemu-4.1.0-5, so that's looking better.

Loading...