FS#66216 - ZFS 0.8.3-1 blocks IO for some time if used intensely by qemu-img convert

Attached to Project: Arch Linux
Opened by Oliver Dzombic (layer7gmbh) - Saturday, 11 April 2020, 16:42 GMT
Last edited by freswa (frederik) - Saturday, 11 April 2020, 16:47 GMT
Task Type Bug Report
Category Kernel
Status Closed
Assigned To No-one
Architecture x86_64
Severity Medium
Priority Normal
Reported Version
Due in Version Undecided
Due Date Undecided
Percent Complete 100%
Votes 0
Private No

Details

Description:

during:

qemu-img convert -p -O raw vmware-test2.vmdk /dev/zvol/kvm-storage/test2

after some time i can see:

#iotop -d 2 -P -a

Total DISK READ : 0.00 B/s | Total DISK WRITE : 42.89 M/s
Actual DISK READ: 0.00 B/s | Actual DISK WRITE: 31.66 M/s
PID PRIO USER DISK READ DISK WRITE SWAPIN IO> COMMAND
1300 be/4 root 0.00 B 0.00 B 0.00 % 82.42 % [txg_sync]
1515051 be/4 nobody 0.00 B 0.00 B 0.00 % 43.13 % [qemu-system-x86]
1626416 be/4 root 0.00 B 0.00 B 0.00 % 28.44 % [kworker/u65:4-events_power_efficient]
3962782 be/4 root 0.00 B 0.00 B 0.00 % 23.73 % [kworker/u65:0+events_freezable_power_]
2619469 be/4 root 0.00 B 0.00 B 0.00 % 5.61 % [kworker/u65:1+events_freezable_power_]
1591517 be/4 root 0.00 B 0.00 B 0.00 % 5.53 % [kworker/u65:3-events_freezable_power_]
1600404 be/4 root 0.00 B 0.00 B 0.00 % 0.72 % [kworker/u66:1-events_power_efficient]
1552332 be/4 root 0.00 B 0.00 B 0.00 % 0.53 % [kworker/u66:2-events_unbound]
4003168 be/4 root 0.00 B 0.00 B 0.00 % 0.38 % [kworker/u66:3-events_unbound]
713 be/4 root 0.00 B 0.00 B 0.00 % 0.03 % [l2arc_feed]
1165 be/0 root 0.00 B 0.00 B 0.00 % 0.01 % [z_null_iss]
1189 be/0 root 0.00 B 0.00 B 0.00 % 0.01 % [z_wr_iss]
1208 be/0 root 0.00 B 0.00 B 0.00 % 0.01 % [z_wr_int]
1203 be/0 root 0.00 B 0.00 B 0.00 % 0.01 % [z_wr_int]
1205 be/0 root 0.00 B 0.00 B 0.00 % 0.00 % [z_wr_int]
497 be/3 root 0.00 B 8.00 K 0.00 % 0.00 % [jbd2/sda1-8]
530 be/4 root 0.00 B 272.00 K 0.00 % 0.00 % systemd-journald
667 be/0 root 0.00 B 85.50 M 0.00 % 0.00 % [zvol]
3816186 be/0 root 0.00 B 85.63 M 0.00 % 0.00 % [zvol]
[...]

followed by:

#dmesg

[19906.081791] INFO: task qemu-img:2206615 blocked for more than 122 seconds.
[19906.081795] Tainted: P OE 5.4.31-1-lts #1
[19906.081796] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[19906.081799] qemu-img D 0 2206615 3613123 0x00000080
[19906.081805] Call Trace:
[19906.081828] ? __schedule+0x2f3/0x750
[19906.081834] ? bit_wait_timeout+0x90/0x90
[19906.081838] schedule+0x39/0xa0
[19906.081844] io_schedule+0x12/0x40
[19906.081849] bit_wait_io+0xd/0x50
[19906.081854] __wait_on_bit+0x2a/0x90
[19906.081860] out_of_line_wait_on_bit+0x92/0xb0
[19906.081868] ? var_wake_function+0x20/0x20
[19906.081876] __block_write_begin_int+0x258/0x660
[19906.081885] ? blkdev_direct_IO+0x4d0/0x4d0
[19906.081890] ? blkdev_direct_IO+0x4d0/0x4d0
[19906.081894] block_write_begin+0x48/0xf0
[19906.081904] generic_perform_write+0xd1/0x1d0
[19906.081911] __generic_file_write_iter+0xf9/0x1d0
[19906.081917] blkdev_write_iter+0xd9/0x160
[19906.081926] new_sync_write+0x16a/0x210
[19906.081932] vfs_write+0xb6/0x1a0
[19906.081937] ksys_pwrite64+0x65/0xa0
[19906.081946] do_syscall_64+0x4e/0x140
[19906.081953] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[19906.081958] RIP: 0033:0x7f7add9d807f
[19906.081970] Code: Bad RIP value.
[19906.081971] RSP: 002b:00007f7adbffc100 EFLAGS: 00000293 ORIG_RAX: 0000000000000012
[19906.081975] RAX: ffffffffffffffda RBX: 00007f7ad6ebe000 RCX: 00007f7add9d807f
[19906.081977] RDX: 0000000000200000 RSI: 00007f7ad6ebe000 RDI: 000000000000000a
[19906.081978] RBP: 00007f7ad91d7790 R08: 0000000000000000 R09: 00000000ffffffff
[19906.081980] R10: 00000004c181ee00 R11: 0000000000000293 R12: 0000000000000000
[19906.081982] R13: 00007f7adcc52078 R14: 00007f7adcc8baf0 R15: 00007f7adcc52000


with this, i saw that a running qemu got stucked. I tried to restart it:

#virsh destroy testvm1
error: Failed to destroy domain testvm1
error: Timed out during operation: cannot acquire state change lock (held by monitor=remoteDispatchDomainReset)

followed by:

[20260.778955] device testvm1 left promiscuous mode
[20260.779060] audit: type=1700 audit(1586621892.604:86): dev=testvm1 prom=0 old_prom=256 auid=4294967295 uid=65534 gid=992 ses=4294967295
[20397.591743] INFO: task systemd-udevd:548 blocked for more than 122 seconds.
[20397.591748] Tainted: P OE 5.4.31-1-lts #1
[20397.591749] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[20397.591752] systemd-udevd D 0 548 1 0x00000180
[20397.591758] Call Trace:
[20397.591780] ? __schedule+0x2f3/0x750
[20397.591785] schedule+0x39/0xa0
[20397.591798] schedule_preempt_disabled+0xa/0x10
[20397.591802] __mutex_lock.isra.0+0x19b/0x500
[20397.591813] __blkdev_get+0x7f/0x550
[20397.591817] ? blkdev_get_by_dev+0x50/0x50
[20397.591824] do_dentry_open+0x13d/0x3a0
[20397.591829] path_openat+0x5d0/0x1650
[20397.591835] do_filp_open+0xab/0x120
[20397.591841] do_sys_open+0x19e/0x240
[20397.591851] do_syscall_64+0x4e/0x140
[20397.591857] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[20397.591863] RIP: 0033:0x7f1476f4222b
[20397.591875] Code: Bad RIP value.
[20397.591877] RSP: 002b:00007fffd0a37da0 EFLAGS: 00000246 ORIG_RAX: 0000000000000101
[20397.591880] RAX: ffffffffffffffda RBX: 00007fffd0a37eb0 RCX: 00007f1476f4222b
[20397.591882] RDX: 00000000000a0800 RSI: 000055eeaf6bec30 RDI: 00000000ffffff9c
[20397.591884] RBP: 000055eeaf6bec30 R08: 00007f1476dc35b0 R09: 0000000000000000
[20397.591885] R10: 0000000000000000 R11: 0000000000000246 R12: 00000000000a0800
[20397.591887] R13: 000055eeaf39c000 R14: 000055eeaf79e740 R15: 00007fffd0a37e78

followed by:

[20520.471723] INFO: task systemd-udevd:548 blocked for more than 245 seconds.
[20520.471728] Tainted: P OE 5.4.31-1-lts #1
[20520.471729] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[20520.471732] systemd-udevd D 0 548 1 0x00000184
[20520.471738] Call Trace:
[20520.471759] ? __schedule+0x2f3/0x750
[20520.471764] schedule+0x39/0xa0
[20520.471768] schedule_preempt_disabled+0xa/0x10
[20520.471772] __mutex_lock.isra.0+0x19b/0x500
[20520.471780] __blkdev_get+0x7f/0x550
[20520.471784] ? blkdev_get_by_dev+0x50/0x50
[20520.471790] do_dentry_open+0x13d/0x3a0
[20520.471795] path_openat+0x5d0/0x1650
[20520.471801] do_filp_open+0xab/0x120
[20520.471807] do_sys_open+0x19e/0x240
[20520.471815] do_syscall_64+0x4e/0x140
[20520.471820] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[20520.471825] RIP: 0033:0x7f1476f4222b
[20520.471837] Code: Bad RIP value.
[20520.471839] RSP: 002b:00007fffd0a37da0 EFLAGS: 00000246 ORIG_RAX: 0000000000000101
[20520.471843] RAX: ffffffffffffffda RBX: 00007fffd0a37eb0 RCX: 00007f1476f4222b
[20520.471845] RDX: 00000000000a0800 RSI: 000055eeaf6bec30 RDI: 00000000ffffff9c
[20520.471847] RBP: 000055eeaf6bec30 R08: 00007f1476dc35b0 R09: 0000000000000000
[20520.471848] R10: 0000000000000000 R11: 0000000000000246 R12: 00000000000a0800
[20520.471850] R13: 000055eeaf39c000 R14: 000055eeaf79e740 R15: 00007fffd0a37e78
[20569.913134] audit: type=1701 audit(1586622201.744:87): auid=4294967295 uid=0 gid=0 ses=4294967295 pid=548 comm="systemd-udevd" exe="/usr/lib/systemd/systemd-udevd" sig=6 res=1
[20569.962435] audit: type=1130 audit(1586622201.794:88): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=systemd-coredump@0-583936-0 comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
[20570.099826] audit: type=2500 audit(1586622201.924:89): pid=833 uid=0 auid=4294967295 ses=4294967295 msg='virt=kvm op=stop reason=failed vm="testvm1" uuid=5fe5a2ac-afc3-4db8-8a84-4f4096fee9dc vm-pid=-1 exe="/usr/bin/libvirtd" hostname=? addr=? terminal=? res=success'
[20570.394184] audit: type=1131 audit(1586622202.224:90): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=systemd-udevd comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=failed'
[20570.395190] audit: type=1130 audit(1586622202.224:91): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=systemd-udevd comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
[20570.395202] audit: type=1131 audit(1586622202.224:92): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=systemd-udevd comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
[20570.515783] audit: type=1131 audit(1586622202.344:93): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=systemd-coredump@0-583936-0 comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
[20570.791557] audit: type=1130 audit(1586622202.614:94): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=systemd-udevd comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'


The qemu-img process continued and finished successfull its work:

qemu-img convert -p -O raw vmware-test2.vmdk /dev/zvol/kvm-storage/test2
(100.00/100%)


I was able to start testvm1 again and also start a new kvm instance with the new test2 zfs volume.

So as it seems to me, zfs blocked IO for some time completely.

testvm1 is 350 GB and got converted well without issues.
test2 is 50 GB and got converted while a kvm instance with the converted testvm1 was running already.


Additional info:
* package version(s):

# uname -sv
Linux #1 SMP Wed, 08 Apr 2020 08:29:49 +0000

# zfs --version
zfs-0.8.3-1
zfs-kmod-0.8.3-1

# virsh -V
Virsh command line tool of libvirt 5.10.0

# qemu-system-x86_64 --version
QEMU emulator version 4.2.0



* config and/or log files etc.

# zpool status -v
pool: kvm-storage
state: ONLINE
scan: none requested
config:

NAME STATE READ WRITE CKSUM
kvm-storage ONLINE 0 0 0
raidz1-0 ONLINE 0 0 0
pci-0000:82:00.0-scsi-0:1:23:0 ONLINE 0 0 0
pci-0000:82:00.0-scsi-0:1:24:0 ONLINE 0 0 0
pci-0000:82:00.0-scsi-0:1:26:0 ONLINE 0 0 0
raidz1-1 ONLINE 0 0 0
pci-0000:82:00.0-scsi-0:1:27:0 ONLINE 0 0 0
pci-0000:82:00.0-scsi-0:1:28:0 ONLINE 0 0 0
pci-0000:82:00.0-scsi-0:1:29:0 ONLINE 0 0 0
logs
mirror-2 ONLINE 0 0 0
pci-0000:82:00.0-scsi-0:1:30:0-part1 ONLINE 0 0 0
pci-0000:82:00.0-scsi-0:1:31:0-part1 ONLINE 0 0 0
cache
pci-0000:82:00.0-scsi-0:1:30:0-part2 ONLINE 0 0 0
pci-0000:82:00.0-scsi-0:1:31:0-part2 ONLINE 0 0 0

errors: No known data errors

This task depends upon

Closed by  freswa (frederik)
Saturday, 11 April 2020, 16:47 GMT
Reason for closing:  Won't fix
Additional comments about closing:  zfs is not in the official repositories

Loading...