FS#70992 - bvec_alloc crash with kernel 5.12.5-arch1-1
Attached to Project:
Arch Linux
Opened by Jens Stutte (jensstutte) - Saturday, 22 May 2021, 19:23 GMT
Last edited by Toolybird (Toolybird) - Tuesday, 06 June 2023, 03:21 GMT
Opened by Jens Stutte (jensstutte) - Saturday, 22 May 2021, 19:23 GMT
Last edited by Toolybird (Toolybird) - Tuesday, 06 June 2023, 03:21 GMT
|
Details
Description:
After upgrading to kernel 5.12.5-arch1-1, I experience frequent hangs and found the following in my journalctl: Mai 21 19:09:06 vdr kernel: ------------[ cut here ]------------ Mai 21 19:09:06 vdr kernel: kernel BUG at block/bio.c:52! Mai 21 19:09:06 vdr kernel: invalid opcode: 0000 [#1] PREEMPT SMP NOPTI Mai 21 19:09:06 vdr kernel: CPU: 13 PID: 272 Comm: kworker/u64:4 Not tainted 5.12.5-arch1-1 #1 Mai 21 19:09:06 vdr kernel: Hardware name: ASUS System Product Name/TUF GAMING B550M-PLUS, BIOS 1804 02/02/2021 Mai 21 19:09:06 vdr kernel: Workqueue: writeback wb_workfn (flush-9:0) Mai 21 19:09:06 vdr kernel: RIP: 0010:biovec_slab.part.0+0x5/0x10 Mai 21 19:09:06 vdr kernel: Code: 81 18 63 00 48 8b 6b f0 48 85 ed 75 ca 5b 4c 89 e7 5d 41 5c e9 4c 18 63 00 48 c7 43 f8 00 00 00 00 eb c1 66 90 0f 1f 44 00 00 <0f> 0b 66 0f 1f 84 00 00 00 00 00 0f 1f 4> Mai 21 19:09:06 vdr kernel: RSP: 0018:ffffb37500f27620 EFLAGS: 00010202 Mai 21 19:09:06 vdr kernel: RAX: 00000000000000bf RBX: ffffb37500f27654 RCX: 0000000000000100 Mai 21 19:09:06 vdr kernel: RDX: 0000000000000c00 RSI: ffffb37500f27654 RDI: ffff970080e9dc38 Mai 21 19:09:06 vdr kernel: RBP: 0000000000000c00 R08: ffff970080e9dc38 R09: ffff970109e06a00 Mai 21 19:09:06 vdr kernel: R10: 0000000000000004 R11: ffffb37500f27788 R12: ffff970080e9dc38 Mai 21 19:09:06 vdr kernel: R13: 0000000000000c00 R14: 0000000000000c00 R15: ffff970080e9dbf0 Mai 21 19:09:06 vdr kernel: FS: 0000000000000000(0000) GS:ffff97078ed40000(0000) knlGS:0000000000000000 Mai 21 19:09:06 vdr kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Mai 21 19:09:06 vdr kernel: CR2: 00007fa7679f3010 CR3: 000000014c682000 CR4: 0000000000350ee0 Mai 21 19:09:06 vdr kernel: Call Trace: Mai 21 19:09:06 vdr kernel: bvec_alloc+0x90/0xc0 Mai 21 19:09:06 vdr kernel: bio_alloc_bioset+0x1b3/0x260 Mai 21 19:09:06 vdr kernel: raid1_make_request+0x9ce/0xc50 [raid1] Mai 21 19:09:06 vdr kernel: ? __bio_clone_fast+0xa8/0xe0 Mai 21 19:09:06 vdr kernel: md_handle_request+0x158/0x1d0 [md_mod] Mai 21 19:09:06 vdr kernel: md_submit_bio+0xcd/0x110 [md_mod] Mai 21 19:09:06 vdr kernel: submit_bio_noacct+0x139/0x530 Mai 21 19:09:06 vdr kernel: ? __test_set_page_writeback+0x89/0x2d0 Mai 21 19:09:06 vdr kernel: submit_bio+0x78/0x1d0 Mai 21 19:09:06 vdr kernel: ext4_bio_write_page+0x1fd/0x630 [ext4] Mai 21 19:09:06 vdr kernel: mpage_submit_page+0x46/0x80 [ext4] Mai 21 19:09:06 vdr kernel: ext4_writepages+0x9ed/0x1170 [ext4] Mai 21 19:09:06 vdr kernel: ? do_writepages+0x41/0x100 Mai 21 19:09:06 vdr kernel: do_writepages+0x41/0x100 Mai 21 19:09:06 vdr kernel: ? __wb_calc_thresh+0x4b/0x140 Mai 21 19:09:06 vdr kernel: __writeback_single_inode+0x3d/0x310 Mai 21 19:09:06 vdr kernel: ? wbc_detach_inode+0x13f/0x210 Mai 21 19:09:06 vdr kernel: writeback_sb_inodes+0x1fc/0x480 Mai 21 19:09:06 vdr kernel: __writeback_inodes_wb+0x4c/0xe0 Mai 21 19:09:06 vdr kernel: wb_writeback+0x22e/0x320 Mai 21 19:09:06 vdr kernel: wb_workfn+0x392/0x5c0 Mai 21 19:09:06 vdr kernel: process_one_work+0x214/0x3e0 Mai 21 19:09:06 vdr kernel: worker_thread+0x4d/0x3d0 Mai 21 19:09:06 vdr kernel: ? process_one_work+0x3e0/0x3e0 Mai 21 19:09:06 vdr kernel: kthread+0x133/0x150 Mai 21 19:09:06 vdr kernel: ? kthread_associate_blkcg+0xc0/0xc0 Mai 21 19:09:06 vdr kernel: ret_from_fork+0x22/0x30 Mai 21 19:09:06 vdr kernel: Modules linked in: cfg80211 8021q garp mrp stp llc nct6775 mousedev joydev intel_rapl_msr intel_rapl_common amdgpu edac_mce_amd snd_hda_codec_realtek snd_hda_codec_generic le> Mai 21 19:09:06 vdr kernel: ---[ end trace 475f9c7132a03933 ]--- Former 5.11.16-arch1-1 worked. I see that the kernel is now compiled with GCC 11 vs. GCC 10 before. There might be some non-Ryzen 9 3900 XT compatible code generation ongoing here? Additional info: * package version(s) * config and/or log files etc. * link to upstream bug report, if any Steps to reproduce: Upgrade kernel and boot. |
This task depends upon
Closed by Toolybird (Toolybird)
Tuesday, 06 June 2023, 03:21 GMT
Reason for closing: No response
Additional comments about closing: Old and stale. If still an issue, please follow PM's instructions and report the issue upstream and submit the patch.
Tuesday, 06 June 2023, 03:21 GMT
Reason for closing: No response
Additional comments about closing: Old and stale. If still an issue, please follow PM's instructions and report the issue upstream and submit the patch.
https://bbs.archlinux.org/viewtopic.php?id=266125
BTW, I see the exact same behavior as in https://bbs.archlinux.org/viewtopic.php?pid=1971470#p1971470, thanks for pointing me there!
~~~
[root@vdr jens]# df -h
Dateisystem Größe Benutzt Verf. Verw% Eingehängt auf
dev 16G 0 16G 0% /dev
run 16G 1,4M 16G 1% /run
/dev/sda4 63G 33G 27G 56% /
tmpfs 16G 0 16G 0% /dev/shm
tmpfs 16G 0 16G 0% /tmp
/dev/md0 1,8T 739G 1002G 43% /mnt/raid
tmpfs 3,2G 60K 3,2G 1% /run/user/1000
tmpfs 3,2G 60K 3,2G 1% /run/user/969
~~~
https://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux-block.git/log/?h=block-5.13
and
180599cb-7c2e-da35-96a5-225462c6cd71@kernel.dk/T/#t"> https://lore.kernel.org/linux-bcache/180599cb-7c2e-da35-96a5-225462c6cd71@kernel.dk/T/#t
These two tested patches are supposed to fix the issue for actual bcache use. They are probably going into 5.13. Please consider applying them to all supported, suffering kernels.
https://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux-block.git/patch/?id=1616a4c2ab1a80893b6890ae93da40a2b1d0c691
https://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux-block.git/patch/?id=41fe8d088e96472f63164e213de44ec77be69478
https://cdn.kernel.org/pub/linux/kernel/v5.x/ChangeLog-5.12.11
https://git.kernel.org/pub/scm/linux/kernel/git/stable/stable-queue.git/diff/releases/5.12.11/bcache-avoid-oversized-read-request-in-cache-missing-code-path.patch?h=v5.12.11
https://git.kernel.org/pub/scm/linux/kernel/git/stable/stable-queue.git/diff/releases/5.12.11/bcache-remove-bcache-device-self-defined-readahead.patch?h=v5.12.11
My OS is on lvm which is on mdraid. I don't use bcache.
Part of call trace:
---
[ 8.096941] Call Trace:
[ 8.097936] bvec_alloc+0x90/0xc0
[ 8.098934] bio_alloc_bioset+0x1b3/0x260
[ 8.099959] raid1_make_request+0x9ce/0xc50 [raid1]
[ 8.100988] ? __bio_clone_fast+0xa8/0xe0
[ 8.102008] md_handle_request+0x158/0x1d0 [md_mod]
[ 8.103050] md_submit_bio+0xcd/0x110 [md_mod]
[ 8.104084] submit_bio_noacct+0x139/0x530
[ 8.105127] submit_bio+0x78/0x1d0
[ 8.106163] ext4_io_submit+0x48/0x60 [ext4]
[ 8.107242] ext4_writepages+0x652/0x1170 [ext4]
[ 8.108300] ? do_writepages+0x41/0x100
[ 8.109338] ? __ext4_mark_inode_dirty+0x240/0x240 [ext4]
[ 8.110406] do_writepages+0x41/0x100
[ 8.111450] __filemap_fdatawrite_range+0xc5/0x100
[ 8.112513] file_write_and_wait_range+0x61/0xb0
[ 8.113564] ext4_sync_file+0x73/0x370 [ext4]
[ 8.114607] __x64_sys_fsync+0x33/0x60
[ 8.115635] do_syscall_64+0x33/0x40
[ 8.116670] entry_SYSCALL_64_after_hwframe+0x44/0xae
---
[1] https://github.com/archlinux/linux/commits/v5.13.9-arch1
I was trying to setup the arch kernel build environment to do so, but I am relatively new to arch (former gentoo user) and was struggling yesterday a bit with makepkg and keys. I guess I'll need just to dedicate some more time then.
But if it takes too long I might just consider to disable write-behind on those devices, IIUC that would fix the issue, too.
drivers/md/raid1.c: In function ‘raid1_write_request’:
drivers/md/raid1.c:1454:67: error: ‘PAGE_SECTORS’ undeclared (first use in this function); did you mean ‘READ_SECTORS’?
1454 | max_sectors = min_t(uint32_t, max_sectors, BIO_MAX_VECS * PAGE_SECTORS);
Edit:
Same error with 5.14-rc5. Attached the patch I used.
#include "bcache/util.h"
at the top of raid1.c, it compiles, at least. But it feels kind of wrong to make md depend on bcache?
I just tried kernel 5.14.7-arch1-1 which is supposed to contain the patch. Unfortunately the problem persists, see attached log.
With the custom kernel I built from the initial tentative patch (that looks a bit different from what went into the kernel sources) it still works, instead.
The condition used to decide if we need to split differed from the condition used to decide when to alloc.
This patch just splits always if there is a bitmap and the max_sectors is too big.