Arch Linux

Please read this before reporting a bug:
https://wiki.archlinux.org/index.php/Reporting_Bug_Guidelines

Do NOT report bugs when a package is just outdated, or it is in the AUR. Use the 'flag out of date' link on the package page, or the Mailing List.

REPEAT: Do NOT report bugs for outdated packages!
Tasklist

FS#70992 - bvec_alloc crash with kernel 5.12.5-arch1-1

Attached to Project: Arch Linux
Opened by Jens Stutte (jensstutte) - Saturday, 22 May 2021, 19:23 GMT
Last edited by Andreas Radke (AndyRTR) - Saturday, 22 May 2021, 19:34 GMT
Task Type Bug Report
Category Kernel
Status Assigned
Assigned To Jan Alexander Steffens (heftig)
Architecture x86_64
Severity High
Priority Normal
Reported Version
Due in Version Undecided
Due Date Undecided
Percent Complete 0%
Votes 0
Private No

Details

Description:
After upgrading to kernel 5.12.5-arch1-1, I experience frequent hangs and found the following in my journalctl:

Mai 21 19:09:06 vdr kernel: ------------[ cut here ]------------
Mai 21 19:09:06 vdr kernel: kernel BUG at block/bio.c:52!
Mai 21 19:09:06 vdr kernel: invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
Mai 21 19:09:06 vdr kernel: CPU: 13 PID: 272 Comm: kworker/u64:4 Not tainted 5.12.5-arch1-1 #1
Mai 21 19:09:06 vdr kernel: Hardware name: ASUS System Product Name/TUF GAMING B550M-PLUS, BIOS 1804 02/02/2021
Mai 21 19:09:06 vdr kernel: Workqueue: writeback wb_workfn (flush-9:0)
Mai 21 19:09:06 vdr kernel: RIP: 0010:biovec_slab.part.0+0x5/0x10
Mai 21 19:09:06 vdr kernel: Code: 81 18 63 00 48 8b 6b f0 48 85 ed 75 ca 5b 4c 89 e7 5d 41 5c e9 4c 18 63 00 48 c7 43 f8 00 00 00 00 eb c1 66 90 0f 1f 44 00 00 <0f> 0b 66 0f 1f 84 00 00 00 00 00 0f 1f 4>
Mai 21 19:09:06 vdr kernel: RSP: 0018:ffffb37500f27620 EFLAGS: 00010202
Mai 21 19:09:06 vdr kernel: RAX: 00000000000000bf RBX: ffffb37500f27654 RCX: 0000000000000100
Mai 21 19:09:06 vdr kernel: RDX: 0000000000000c00 RSI: ffffb37500f27654 RDI: ffff970080e9dc38
Mai 21 19:09:06 vdr kernel: RBP: 0000000000000c00 R08: ffff970080e9dc38 R09: ffff970109e06a00
Mai 21 19:09:06 vdr kernel: R10: 0000000000000004 R11: ffffb37500f27788 R12: ffff970080e9dc38
Mai 21 19:09:06 vdr kernel: R13: 0000000000000c00 R14: 0000000000000c00 R15: ffff970080e9dbf0
Mai 21 19:09:06 vdr kernel: FS: 0000000000000000(0000) GS:ffff97078ed40000(0000) knlGS:0000000000000000
Mai 21 19:09:06 vdr kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Mai 21 19:09:06 vdr kernel: CR2: 00007fa7679f3010 CR3: 000000014c682000 CR4: 0000000000350ee0
Mai 21 19:09:06 vdr kernel: Call Trace:
Mai 21 19:09:06 vdr kernel: bvec_alloc+0x90/0xc0
Mai 21 19:09:06 vdr kernel: bio_alloc_bioset+0x1b3/0x260
Mai 21 19:09:06 vdr kernel: raid1_make_request+0x9ce/0xc50 [raid1]
Mai 21 19:09:06 vdr kernel: ? __bio_clone_fast+0xa8/0xe0
Mai 21 19:09:06 vdr kernel: md_handle_request+0x158/0x1d0 [md_mod]
Mai 21 19:09:06 vdr kernel: md_submit_bio+0xcd/0x110 [md_mod]
Mai 21 19:09:06 vdr kernel: submit_bio_noacct+0x139/0x530
Mai 21 19:09:06 vdr kernel: ? __test_set_page_writeback+0x89/0x2d0
Mai 21 19:09:06 vdr kernel: submit_bio+0x78/0x1d0
Mai 21 19:09:06 vdr kernel: ext4_bio_write_page+0x1fd/0x630 [ext4]
Mai 21 19:09:06 vdr kernel: mpage_submit_page+0x46/0x80 [ext4]
Mai 21 19:09:06 vdr kernel: ext4_writepages+0x9ed/0x1170 [ext4]
Mai 21 19:09:06 vdr kernel: ? do_writepages+0x41/0x100
Mai 21 19:09:06 vdr kernel: do_writepages+0x41/0x100
Mai 21 19:09:06 vdr kernel: ? __wb_calc_thresh+0x4b/0x140
Mai 21 19:09:06 vdr kernel: __writeback_single_inode+0x3d/0x310
Mai 21 19:09:06 vdr kernel: ? wbc_detach_inode+0x13f/0x210
Mai 21 19:09:06 vdr kernel: writeback_sb_inodes+0x1fc/0x480
Mai 21 19:09:06 vdr kernel: __writeback_inodes_wb+0x4c/0xe0
Mai 21 19:09:06 vdr kernel: wb_writeback+0x22e/0x320
Mai 21 19:09:06 vdr kernel: wb_workfn+0x392/0x5c0
Mai 21 19:09:06 vdr kernel: process_one_work+0x214/0x3e0
Mai 21 19:09:06 vdr kernel: worker_thread+0x4d/0x3d0
Mai 21 19:09:06 vdr kernel: ? process_one_work+0x3e0/0x3e0
Mai 21 19:09:06 vdr kernel: kthread+0x133/0x150
Mai 21 19:09:06 vdr kernel: ? kthread_associate_blkcg+0xc0/0xc0
Mai 21 19:09:06 vdr kernel: ret_from_fork+0x22/0x30
Mai 21 19:09:06 vdr kernel: Modules linked in: cfg80211 8021q garp mrp stp llc nct6775 mousedev joydev intel_rapl_msr intel_rapl_common amdgpu edac_mce_amd snd_hda_codec_realtek snd_hda_codec_generic le>
Mai 21 19:09:06 vdr kernel: ---[ end trace 475f9c7132a03933 ]---

Former 5.11.16-arch1-1 worked.
I see that the kernel is now compiled with GCC 11 vs. GCC 10 before. There might be some non-Ryzen 9 3900 XT compatible code generation ongoing here?

Additional info:
* package version(s)
* config and/or log files etc.
* link to upstream bug report, if any

Steps to reproduce:
Upgrade kernel and boot.
This task depends upon

Comment by loqs (loqs) - Saturday, 22 May 2021, 21:34 GMT Comment by Jens Stutte (jensstutte) - Sunday, 23 May 2021, 07:17 GMT
Not that I am aware of, there are no /dev/bcache* and from what I read I would remember the non-trivial setup routine for sure. The system does contain only SSDs (except for a normally switched off external backup HD).
BTW, I see the exact same behavior as in https://bbs.archlinux.org/viewtopic.php?pid=1971470#p1971470, thanks for pointing me there!
Comment by Jens Stutte (jensstutte) - Sunday, 23 May 2021, 07:19 GMT
PS - not sure if it can be related: I do use mdm on that system:

~~~
[root@vdr jens]# df -h
Dateisystem Größe Benutzt Verf. Verw% Eingehängt auf
dev 16G 0 16G 0% /dev
run 16G 1,4M 16G 1% /run
/dev/sda4 63G 33G 27G 56% /
tmpfs 16G 0 16G 0% /dev/shm
tmpfs 16G 0 16G 0% /tmp
/dev/md0 1,8T 739G 1002G 43% /mnt/raid
tmpfs 3,2G 60K 3,2G 1% /run/user/1000
tmpfs 3,2G 60K 3,2G 1% /run/user/969
~~~
Comment by Jens Stutte (jensstutte) - Sunday, 23 May 2021, 08:00 GMT
There seems to be a patch on its way: https://lkml.org/lkml/2021/5/17/1888 (actually a serious of a few patches, it seems). While I am not sure, why this can happen on my machine (I see no module bcache loaded), we might just want to wait to see that patch landing.
Comment by Jens Stutte (jensstutte) - Sunday, 23 May 2021, 08:33 GMT
Sorry for the noise, double POST.
Comment by Jens Stutte (jensstutte) - Sunday, 23 May 2021, 10:19 GMT
So looking at this a bit more, I see that the [BUG in bio.c](https://elixir.bootlin.com/linux/v5.12-rc1/source/block/bio.c#L52) has been introduced in 5.12 in general. This can reveal obviously formerly hidden bugs in any place, and the bcache one might be just one of those. So I would assume that I hit a different case triggered probably by my raid setup. I am not familiar with how to find the "regressing" patch though, which might contain further hints on what to do.
Comment by Jens Stutte (jensstutte) - Sunday, 23 May 2021, 12:13 GMT Comment by Jens Stutte (jensstutte) - Sunday, 23 May 2021, 12:38 GMT Comment by Alexander Ullrich (Mika79) - Friday, 11 June 2021, 22:25 GMT Comment by Craig (Hazey) - Thursday, 17 June 2021, 20:03 GMT Comment by Jens Stutte (jensstutte) - Friday, 18 June 2021, 06:04 GMT
Just to be sure: I do not have configured bcache on my system. These patches are supposed to solve this problem anyway? I will try to find some time to check, but it feels unlikely from the touched files there.
Comment by Ilgiz (Nurik) - Wednesday, 07 July 2021, 11:44 GMT
Still the same issue on kernel 5.12.14-arch1-1. Any help please?

My OS is on lvm which is on mdraid. I don't use bcache.

Part of call trace:

---
[ 8.096941] Call Trace:
[ 8.097936] bvec_alloc+0x90/0xc0
[ 8.098934] bio_alloc_bioset+0x1b3/0x260
[ 8.099959] raid1_make_request+0x9ce/0xc50 [raid1]
[ 8.100988] ? __bio_clone_fast+0xa8/0xe0
[ 8.102008] md_handle_request+0x158/0x1d0 [md_mod]
[ 8.103050] md_submit_bio+0xcd/0x110 [md_mod]
[ 8.104084] submit_bio_noacct+0x139/0x530
[ 8.105127] submit_bio+0x78/0x1d0
[ 8.106163] ext4_io_submit+0x48/0x60 [ext4]
[ 8.107242] ext4_writepages+0x652/0x1170 [ext4]
[ 8.108300] ? do_writepages+0x41/0x100
[ 8.109338] ? __ext4_mark_inode_dirty+0x240/0x240 [ext4]
[ 8.110406] do_writepages+0x41/0x100
[ 8.111450] __filemap_fdatawrite_range+0xc5/0x100
[ 8.112513] file_write_and_wait_range+0x61/0xb0
[ 8.113564] ext4_sync_file+0x73/0x370 [ext4]
[ 8.114607] __x64_sys_fsync+0x33/0x60
[ 8.115635] do_syscall_64+0x33/0x40
[ 8.116670] entry_SYSCALL_64_after_hwframe+0x44/0xae

---
Comment by Jens Stutte (jensstutte) - Thursday, 08 July 2021, 16:45 GMT
Hi Ilgiz, thanks for confirming my doubt (though I'd have preferred to see it solved, of course). I noted your findings on https://bugzilla.kernel.org/show_bug.cgi?id=213181 in order to accelerate things a bit.
Comment by loqs (loqs) - Thursday, 08 July 2021, 22:06 GMT
Is the issue present on 5.13.1? If so please try 5.14-rc1 when it is released.

Loading...