FS#74420 - [linux-lts] bfq null pointer dereference in bfq_idle_extract -> __list_del_entry_valid

Attached to Project: Arch Linux
Opened by ValdikSS (ValdikSS) - Saturday, 09 April 2022, 13:53 GMT
Last edited by Andreas Radke (AndyRTR) - Tuesday, 31 January 2023, 20:50 GMT
Task Type Bug Report
Category Kernel
Status Closed
Assigned To Andreas Radke (AndyRTR)
Architecture All
Severity Medium
Priority Normal
Reported Version
Due in Version Undecided
Due Date Undecided
Percent Complete 100%
Votes 0
Private No

Details

Description:

Since circa kernel 5.10.52 or 5.10.56, system with configured bfq I/O scheduler hangs up the entire platform after some time, usually in high I/O workload moments. All the CPU cores are locked up, alt+sysrq+b does not reboot the system. It either hangs indefinitely or got rebooted if the watchdog is configured.

I managed to capture the following kernel oops on 5.15.32 with netconsole, after which the platform hung up. Loop1 is an LXD file system stored on a system disk.

[42522.578543] BUG: kernel NULL pointer dereference, address: 0000000000000000
[42522.578555] #PF: supervisor read access in kernel mode
[42522.578559] #PF: error_code(0x0000) - not-present page
[42522.578562] PGD 0 P4D 0
[42522.578567] Oops: 0000 [#1] SMP PTI
[42522.578571] CPU: 13 PID: 213350 Comm: kworker/u32:7 Tainted: G S 5.15.32-1-lts #1 bb8765a1c0d822a5d87cc236b26af488e39e88db
[42522.578577] Hardware name: HUANANZHI X99 /X99-8M-F , BIOS 5.11 04/12/2021
[42522.578580] Workqueue: loop1 loop_workfn [loop]
[42522.578590] RIP: 0010:__list_del_entry_valid+0x25/0x90
[42522.578597] Code: c3 0f 1f 40 00 48 8b 17 4c 8b 47 08 48 b8 00 01 00 00 00 00 ad de 48 39 c2 74 26 48 b8 22 01 00 00 00 00 ad de 49 39 c0 74 2b <49> 8b 30 48 39 fe 75 3a 48 8b 52 08 48 39 f2 75 48 b8 01 00 00 00
[42522.578603] RSP: 0018:ffffaac5469a7890 EFLAGS: 00010017
[42522.578607] RAX: dead000000000122 RBX: ffff9aabfbb0b158 RCX: 0000000000000000
[42522.578610] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff9aac70380140
[42522.578613] RBP: ffff9aac70380098 R08: 0000000000000000 R09: 0000000000000000
[42522.578617] R10: 0000000000000001 R11: ffff9aac4a139c00 R12: ffff9aac70380010
[42522.578620] R13: ffff9aac4a139c70 R14: 0000000000000000 R15: ffff9aabc6443c00
[42522.578623] FS: 0000000000000000(0000) GS:ffff9aaf2ff40000(0000) knlGS:0000000000000000
[42522.578627] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[42522.578631] CR2: 0000000000000000 CR3: 000000006ae10005 CR4: 00000000001706e0
[42522.578634] Call Trace:
[42522.578638] <TASK>
[42522.578641] bfq_idle_extract+0x52/0xb0
[42522.578648] bfq_put_idle_entity+0x12/0x60
[42522.578652] bfq_bfqq_served+0xc1/0x1a0
[42522.578657] bfq_dispatch_request+0x2d3/0x12a0
[42522.578661] ? __sbitmap_get_word+0x30/0x80
[42522.578668] __blk_mq_do_dispatch_sched+0x219/0x320
[42522.578674] ? recalibrate_cpu_khz+0x10/0x10
[42522.578681] ? ktime_get+0x38/0x90
[42522.578686] ? bfq_insert_requests+0x778/0x16e0
[42522.578690] __blk_mq_sched_dispatch_requests+0x109/0x160
[42522.578696] blk_mq_sched_dispatch_requests+0x30/0x60
[42522.578701] __blk_mq_run_hw_queue+0x2b/0x90
[42522.578707] __blk_mq_delay_run_hw_queue+0x144/0x150
[42522.578711] blk_mq_sched_insert_requests+0x63/0xe0
[42522.578717] blk_mq_flush_plug_list+0x10f/0x1a0
[42522.578722] blk_finish_plug+0x21/0x30
[42522.578728] __iomap_dio_rw+0x59e/0x7c0
[42522.578737] iomap_dio_rw+0xa/0x30
[42522.578741] ext4_file_read_iter+0x101/0x160 [ext4 dd6da0888b8148498814602f34d1bf7d7eae8148]
[42522.578793] lo_rw_aio.isra.0+0x2c3/0x2e0 [loop bc975935d69a92a419a6272704b0dab0b0464574]
[42522.578801] loop_process_work+0x6e4/0xcb0 [loop bc975935d69a92a419a6272704b0dab0b0464574]
[42522.578808] ? raw_spin_rq_lock_nested+0xa/0x10
[42522.578814] ? newidle_balance+0x2ef/0x400
[42522.578822] ? __switch_to_asm+0x42/0x70
[42522.578829] ? __switch_to+0x11b/0x420
[42522.578835] process_one_work+0x1f1/0x390
[42522.578841] worker_thread+0x53/0x3e0
[42522.578846] ? process_one_work+0x390/0x390
[42522.578850] kthread+0x127/0x150
[42522.578857] ? set_kthread_struct+0x40/0x40
[42522.578862] ret_from_fork+0x22/0x30
[42522.578868] </TASK>
[42522.578871] Modules linked in: nf_conntrack_netlink netconsole xt_conntrack nft_chain_nat xt_addrtype nft_counter xt_owner nft_compat nf_tables overlay ip6table_raw ip6t_rpfilter iptable_raw ipt_rpfilter veth xt_CHECKSUM xt_tcpudp xt_comment xt_MASQUERADE ip6table_nat ip6table_mangle ip6table_filter ip6_tables bridge stp llc btrfs blake2b_generic xor raid6_pq loop vhost_vsock vmw_vsock_virtio_transport_common vhost vhost_iotlb vsock dm_crypt cbc encrypted_keys trusted asn1_encoder tee tpm rng_core rfkill lzo_rle zram nfnetlink_queue nfnetlink iptable_mangle iptable_filter iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c nct6775 hwmon_vid intel_rapl_msr intel_rapl_common vfat fat x86_pkg_temp_thermal intel_powerclamp coretemp snd_hda_codec_realtek snd_hda_codec_generic kvm_intel ledtrig_audio kvm snd_hda_intel irqbypass crct10dif_pclmul snd_intel_dspcfg snd_intel_sdw_acpi crc32_pclmul snd_hda_codec intel_spi_platform ghash_clmulni_intel aesni_intel iTCO_wdt psmouse
[42522.578926] serio_raw intel_spi snd_hda_core crypto_simd intel_pmc_bxt spi_nor atkbd mtd cryptd iTCO_vendor_support snd_hwdep mxm_wmi gpio_ich rapl r8169 libps2 intel_cstate snd_pcm snd_timer snd i2c_i801 realtek mdio_devres intel_uncore i2c_smbus soundcore wmi libphy lpc_ich mac_hid i8042 serio sch_fq tcp_bbr dm_multipath dm_mod sg crypto_user fuse bpf_preload ip_tables x_tables ext4 crc32c_generic crc16 mbcache jbd2 xhci_pci crc32c_intel xhci_pci_renesas
[42522.578987] CR2: 0000000000000000
[42522.578991] ---[ end trace 9588af7b4567ad40 ]---
[42522.578995] RIP: 0010:__list_del_entry_valid+0x25/0x90
[42522.579000] Code: c3 0f 1f 40 00 48 8b 17 4c 8b 47 08 48 b8 00 01 00 00 00 00 ad de 48 39 c2 74 26 48 b8 22 01 00 00 00 00 ad de 49 39 c0 74 2b <49> 8b 30 48 39 fe 75 3a 48 8b 52 08 48 39 f2 75 48 b8 01 00 00 00
[42522.579005] RSP: 0018:ffffaac5469a7890 EFLAGS: 00010017
[42522.579009] RAX: dead000000000122 RBX: ffff9aabfbb0b158 RCX: 0000000000000000
[42522.579013] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff9aac70380140
[42522.579016] RBP: ffff9aac70380098 R08: 0000000000000000 R09: 0000000000000000
[42522.579019] R10: 0000000000000001 R11: ffff9aac4a139c00 R12: ffff9aac70380010
[42522.579023] R13: ffff9aac4a139c70 R14: 0000000000000000 R15: ffff9aabc6443c00
[42522.579027] FS: 0000000000000000(0000) GS:ffff9aaf2ff40000(0000) knlGS:0000000000000000
[42522.579031] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[42522.579034] CR2: 0000000000000000 CR3: 000000006ae10005 CR4: 00000000001706e0


Additional info:
* package version(s)
linux-lts 5.15.32-1
* config and/or log files etc.
* link to upstream bug report, if any
https://bugzilla.kernel.org/show_bug.cgi?id=215824

Steps to reproduce:
No exact steps.
This task depends upon

Closed by  Andreas Radke (AndyRTR)
Tuesday, 31 January 2023, 20:50 GMT
Reason for closing:  Upstream

Loading...