Please read this before reporting a bug:
https://wiki.archlinux.org/title/Bug_reporting_guidelines
Do NOT report bugs when a package is just outdated, or it is in the AUR. Use the 'flag out of date' link on the package page, or the Mailing List.
REPEAT: Do NOT report bugs for outdated packages!
https://wiki.archlinux.org/title/Bug_reporting_guidelines
Do NOT report bugs when a package is just outdated, or it is in the AUR. Use the 'flag out of date' link on the package page, or the Mailing List.
REPEAT: Do NOT report bugs for outdated packages!
FS#77041 - [linux] btrfs module crash in btrfs_get_64 when transfering data leading to locked I/O
Attached to Project:
Arch Linux
Opened by James Beddek (telans) - Sunday, 08 January 2023, 05:19 GMT
Last edited by Jan Alexander Steffens (heftig) - Wednesday, 18 January 2023, 21:40 GMT
Opened by James Beddek (telans) - Sunday, 08 January 2023, 05:19 GMT
Last edited by Jan Alexander Steffens (heftig) - Wednesday, 18 January 2023, 21:40 GMT
|
DetailsUpon transferring files/data within my btrfs partition eventually all I/O for that partition locks up. I noticed a backtrace for the btrfs module in my `dmesg`.
I believe this occurs when transferring data across subvolumes, but I cannot be sure about that. I have just moved from Gentoo to Arch Linux, with the same partition, and did not face this issue there. I cannot find any information specifically related to this online. I have encountered this issue 4 times in the past 3 days on kernels 6.1.2 & 6.1.3 (which is how long ago I switched to Arch). This crash does not appear to cause any loss of data or metadata (excluding where not synced). Additional info: * linux-6.1.{2,3}-arch1 * Using the default linux kernel supplied by Arch Linux. * Mount options: `rw,noatime,compress-force=zstd:3,ssd,discard=async,space_cache,subvol=archlinux@` Steps to reproduce: - Unknown - Tentative: - Repeatedly transfer large quantities of data across subvolumes (~50GiB). - Note certain programs/operations appear to hang or crash (file transfer operations, Dolphin). - Observe backtrace from the btrfs module in `dmesg`. The call trace is always the same: RIP: 0010:btrfs_get_64+0xdc/0x120 [btrfs] [...] <TASK> btrfs_file_llseek+0x274/0x690 [btrfs 1bd76ddf8f9403becdd22e75f4726e3deb87c09d] ksys_lseek+0x69/0xb0 do_syscall_64+0x5f/0x90 ? syscall_exit_to_user_mode+0x1b/0x40 ? syscall_exit_to_user_mode+0x1b/0x40 ? do_syscall_64+0x6b/0x90 ? do_syscall_64+0x6b/0x90 entry_SYSCALL_64_after_hwframe+0x63/0xcd I have attached the full kernel backtrace as a file. |
This task depends upon
Closed by Jan Alexander Steffens (heftig)
Wednesday, 18 January 2023, 21:40 GMT
Reason for closing: Fixed
Additional comments about closing: 6.1.7.arch1-1
Wednesday, 18 January 2023, 21:40 GMT
Reason for closing: Fixed
Additional comments about closing: 6.1.7.arch1-1
btrfs-crash.txt
[1] https://wiki.archlinux.org/title/Kernel#Debugging_regressions
Some digging leads me to believe it was this commit which fixes my issue: https://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux.git/commit/?h=for-6.2-rc2&id=560840afc3e63bbe5d9c5ef6b2ecf8f3589adff6
The call trace isn't the same but the instruction pointer and the preceding btrfs warning is. That commit has been back-ported to most releases as far as I can tell, and it is in the 6.1.4 changelog.
I'm going to call this fixed with linux-6.1.4
here's the journal messages for that crash
- http://ix.io/4l9c
note these errors
Jan 13 15:15:53 my.machine kernel: BTRFS warning (device dm-0): bad eb member end: ptr 0x3fea start 337862918144 member offset 16383 size 8
Jan 13 15:15:53 my.machine kernel: general protection fault, probably for non-canonical address 0x2e64000000000: 0000 [#1] PREEMPT SMP PTI
Jan 13 15:15:53 my.machine kernel: CPU: 6 PID: 2904 Comm: mkfs.ext4.real Tainted: P OE 6.1.5-arch2-1 #1 757c0dd88e3c02c9c4f37532c2a810ffa3dfb1d6
Jan 13 15:15:53 my.machine kernel: Hardware name: LENOVO 30C5CTO1WW/3138, BIOS M1VKT43A 06/24/2019
Jan 13 15:15:53 my.machine kernel: RIP: 0010:btrfs_get_64+0xdc/0x120 [btrfs]
Jan 13 15:15:53 my.machine kernel: Code: 4a 8b 44 e5 78 48 2b 05 d2 de e4 ec 48 c1 f8 06 48 c1 e0 0c 48 03 05 d3 de e4 ec 81 eb f8 0f 00 00 74 13 31 d2 89 d6 83 c2 01 <0f> b6 3c 30 40 88 3c 31 39 da 72 ef 48 8b 44 24 08 48 8b 54 24 10
Jan 13 15:15:53 my.machine kernel: RSP: 0018:ffffb393c4567dd8 EFLAGS: 00010202
Jan 13 15:15:53 my.machine kernel: RAX: 0002e64000000000 RBX: 0000000000000007 RCX: ffffb393c4567de1
Jan 13 15:15:53 my.machine kernel: RDX: 0000000000000001 RSI: 0000000000000000 RDI: 000000000000000a
Jan 13 15:15:53 my.machine kernel: RBP: ffff8ff1e3d62a00 R08: 0000000000000001 R09: ffffb393c4567bc0
Jan 13 15:15:53 my.machine kernel: R10: 0000000000000003 R11: ffffffffadccb828 R12: 0000000000000003
Jan 13 15:15:53 my.machine kernel: R13: 0000000000003fea R14: 0000000000000000 R15: 0000000000000000
Jan 13 15:15:53 my.machine kernel: FS: 00007f2a4cb8e780(0000) GS:ffff8ff83bd80000(0000) knlGS:0000000000000000
Jan 13 15:15:53 my.machine kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jan 13 15:15:53 my.machine kernel: CR2: 00007f9963f32538 CR3: 00000001f437a001 CR4: 00000000003706e0
Jan 13 15:15:53 my.machine kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Jan 13 15:15:53 my.machine kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Jan 13 15:15:53 my.machine kernel: Call Trace:
Jan 13 15:15:53 my.machine kernel: <TASK>
Jan 13 15:15:53 my.machine kernel: btrfs_file_llseek+0x274/0x690 [btrfs ea980635911aadb0b3570657adf076de23cc00b4]
Jan 13 15:15:53 my.machine kernel: ksys_lseek+0x66/0xb0
Jan 13 15:15:53 my.machine kernel: do_syscall_64+0x5c/0x90
Jan 13 15:15:53 my.machine kernel: ? syscall_exit_to_user_mode+0x1b/0x40
Jan 13 15:15:53 my.machine kernel: ? do_syscall_64+0x6b/0x90
Jan 13 15:15:53 my.machine kernel: ? do_syscall_64+0x6b/0x90
Jan 13 15:15:53 my.machine kernel: ? do_syscall_64+0x6b/0x90
Jan 13 15:15:53 my.machine kernel: entry_SYSCALL_64_after_hwframe+0x63/0xcd
Jan 13 15:15:53 my.machine kernel: RIP: 0033:0x7f2a4cd9e99b
Jan 13 15:15:53 my.machine kernel: Code: c3 48 8b 15 2f a6 00 00 f7 d8 64 89 02 b8 ff ff ff ff eb be 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa b8 08 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 05 c3 0f 1f 40 00 48 8b 15 f9 a5 00 00 f7 d8
Jan 13 15:15:53 my.machine kernel: RSP: 002b:00007ffde5136dd8 EFLAGS: 00000297 ORIG_RAX: 0000000000000008
Jan 13 15:15:53 my.machine kernel: RAX: ffffffffffffffda RBX: 0000560c332e1fb0 RCX: 00007f2a4cd9e99b
Jan 13 15:15:53 my.machine kernel: RDX: 0000000000000003 RSI: 0000000000000000 RDI: 0000000000000004
Jan 13 15:15:53 my.machine kernel: RBP: 0000000000000004 R08: 0000560c33316fc0 R09: 000000000000007c
Jan 13 15:15:53 my.machine kernel: R10: 0000560c332dc010 R11: 0000000000000297 R12: 0000560c3331a280
Jan 13 15:15:53 my.machine kernel: R13: 0000560c33316fc0 R14: 0000560c33304a80 R15: 0000000000000000
Jan 13 15:15:53 my.machine kernel: </TASK>
Jan 13 15:15:53 my.machine kernel: Modules linked in: xt_nat xt_tcpudp veth xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xt_addrtype iptable_filter iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 br_netfilter bridge stp llc rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace sunrpc fscache netfs snd_seq_dummy snd_hrtimer snd_seq snd_seq_device snd_sof_pci_intel_cnl snd_sof_intel_hda_common soundwire_intel soundwire_generic_allocation soundwire_cadence snd_sof_intel_hda snd_sof_pci snd_sof_xtensa_dsp snd_sof snd_sof_utils soundwire_bus snd_soc_skl snd_soc_hdac_hda snd_hda_ext_core intel_rapl_msr snd_soc_sst_ipc intel_rapl_common snd_soc_sst_dsp snd_soc_acpi_intel_match snd_soc_acpi snd_soc_core snd_hda_codec_realtek intel_tcc_cooling snd_compress x86_pkg_temp_thermal snd_hda_codec_generic ac97_bus intel_powerclamp ledtrig_audio snd_hda_codec_hdmi snd_pcm_dmaengine coretemp snd_hda_intel iTCO_wdt snd_intel_dspcfg kvm_intel snd_intel_sdw_acpi 8250_dw spi_nor intel_pmc_bxt
Jan 13 15:15:53 my.machine kernel: snd_hda_codec ee1004 mtd mei_pxp mei_hdcp mei_wdt kvm iTCO_vendor_support irqbypass snd_hda_core rapl vfat snd_hwdep fat intel_cstate snd_pcm intel_uncore think_lmi pcspkr firmware_attributes_class spi_intel_pci wmi_bmof intel_wmi_thunderbolt i2c_i801 intel_lpss_pci snd_timer spi_intel i2c_smbus mei_me intel_lpss snd rtsx_usb_ms mei memstick idma64 soundcore ie31200_edac mousedev acpi_tad cfg80211 rfkill joydev acpi_pad mac_hid fuse ip_tables x_tables dm_crypt cbc encrypted_keys trusted asn1_encoder tee nvidia_drm(POE) nvidia_modeset(POE) rtsx_usb_sdmmc mmc_core rtsx_usb nvidia_uvm(POE) usbhid nvidia(POE) crct10dif_pclmul crc32_pclmul polyval_clmulni polyval_generic gf128mul ghash_clmulni_intel sha512_ssse3 nvme aesni_intel igb crypto_simd nvme_core e1000e cryptd dca nvme_common xhci_pci xhci_pci_renesas video wmi btrfs blake2b_generic libcrc32c crc32c_generic crc32c_intel xor raid6_pq dm_mirror dm_region_hash dm_log dm_mod crypto_user
Jan 13 15:15:53 my.machine kernel: ---[ end trace 0000000000000000 ]---
Jan 13 15:15:53 my.machine kernel: RIP: 0010:btrfs_get_64+0xdc/0x120 [btrfs]
Jan 13 15:15:53 my.machine kernel: Code: 4a 8b 44 e5 78 48 2b 05 d2 de e4 ec 48 c1 f8 06 48 c1 e0 0c 48 03 05 d3 de e4 ec 81 eb f8 0f 00 00 74 13 31 d2 89 d6 83 c2 01 <0f> b6 3c 30 40 88 3c 31 39 da 72 ef 48 8b 44 24 08 48 8b 54 24 10
Jan 13 15:15:53 my.machine kernel: RSP: 0018:ffffb393c4567dd8 EFLAGS: 00010202
Jan 13 15:15:53 my.machine kernel: RAX: 0002e64000000000 RBX: 0000000000000007 RCX: ffffb393c4567de1
Jan 13 15:15:53 my.machine kernel: RDX: 0000000000000001 RSI: 0000000000000000 RDI: 000000000000000a
Jan 13 15:15:53 my.machine kernel: RBP: ffff8ff1e3d62a00 R08: 0000000000000001 R09: ffffb393c4567bc0
Jan 13 15:15:53 my.machine kernel: R10: 0000000000000003 R11: ffffffffadccb828 R12: 0000000000000003
Jan 13 15:15:53 my.machine kernel: R13: 0000000000003fea R14: 0000000000000000 R15: 0000000000000000
Jan 13 15:15:53 my.machine kernel: FS: 00007f2a4cb8e780(0000) GS:ffff8ff83bd80000(0000) knlGS:0000000000000000
Jan 13 15:15:53 my.machine kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jan 13 15:15:53 my.machine kernel: CR2: 00007f9963f32538 CR3: 00000001f437a001 CR4: 00000000003706e0
Jan 13 15:15:53 my.machine kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Jan 13 15:15:53 my.machine kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
I enabled bees [1] on my btrfs partition with 6.1.5 which causes extremely heavy reads/writes of block & extent data during the first pass, I somewhat expected that if this wasn't fixed I would've seen it again.
This may be the true fix:
https://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux.git/commit/?h=for-next&id=79fe19ac2bd4680014a2d9cf08a46bfe3ecfad08
https://lore.kernel.org/linux-btrfs/7f25442f-b121-2a3a-5a3d-22bcaae83cd4%40leemhuis.info/
Looks like it will land in 6.1.7?
[1]: https://github.com/Zygo/bees
Side note: Why does Flyspray clobber the mailing list link?
Because it's crap? :) See the "Tip" here [1]. I've fixed this one for you.
[1] https://wiki.archlinux.org/title/Bug_reporting_guidelines#Summary
For me it happens when I install azure-cli from the AUR through paru. It crashes at the end when it's cleaning up
I've also ran into this crash before when I had some volumes bind mounted to a docker container and was running some IO intensive tasks in the container
Edit: I'll check again once 6.1.7 lands