FS#70955 - [linux] high kworker CPU usage

Attached to Project: Arch Linux
Opened by M. Greyson Christoforo (greyltc) - Thursday, 20 May 2021, 07:09 GMT
Last edited by Jan Alexander Steffens (heftig) - Friday, 11 March 2022, 16:37 GMT
Task Type Bug Report
Category Kernel
Status Closed
Assigned To Jan Alexander Steffens (heftig)
Architecture All
Severity Low
Priority Normal
Reported Version
Due in Version Undecided
Due Date Undecided
Percent Complete 100%
Votes 3
Private No

Details

Description:

The top command shows kworker/u$i:$j+events_unbound pegged to 100% nearly continuously.

I guess it could be this bug, but I'm not sure. https://bugzilla.kernel.org/show_bug.cgi?id=212185

Steps to reproduce:
I'm not sure what's causing it. Maybe a fullish btrfs fs? Deleting a few GB of files didn't help.
This task depends upon

Closed by  Jan Alexander Steffens (heftig)
Friday, 11 March 2022, 16:37 GMT
Reason for closing:  Fixed
Comment by M. Greyson Christoforo (greyltc) - Saturday, 12 June 2021, 10:24 GMT
As noted here[1], I can use the perf tool to see what kernel tasks are hogging my CPU.

I have recorded 30 seconds of backtraces, `sudo perf record -g -a sleep 30`, then I can look at a report for that `sudo perf report`:
```
Samples: 120K of event 'cycles', Event count (approx.): 135152356271
Children Self Command Shared Object Symbol
+ 94.86% 0.00% kworker/u16:11+ [kernel.vmlinux] [k] ret_from_fork
+ 94.86% 0.00% kworker/u16:11+ [kernel.vmlinux] [k] kthread
+ 94.86% 0.00% kworker/u16:11+ [kernel.vmlinux] [k] worker_thread
+ 94.86% 0.00% kworker/u16:11+ [kernel.vmlinux] [k] process_one_work
+ 94.86% 1.31% kworker/u16:11+ [btrfs] [k] btrfs_preempt_reclaim_metadata_space
+ 46.04% 1.84% kworker/u16:11+ [btrfs] [k] flush_space
+ 45.06% 2.13% kworker/u16:11+ [btrfs] [k] need_preemptive_reclaim.part.0
+ 37.93% 13.72% kworker/u16:11+ [kernel.vmlinux] [k] __percpu_counter_sum
+ 21.43% 7.58% kworker/u16:11+ [btrfs] [k] start_transaction
+ 17.75% 4.42% kworker/u16:11+ [kernel.vmlinux] [k] cpumask_next
+ 13.10% 5.01% kworker/u16:11+ [btrfs] [k] __btrfs_end_transaction
+ 11.42% 11.18% kworker/u16:11+ [kernel.vmlinux] [k] _find_next_bit.constprop.0
+ 9.47% 0.83% kworker/u16:11+ [btrfs] [k] btrfs_run_delayed_refs
+ 8.76% 8.50% kworker/u16:11+ [kernel.vmlinux] [k] _raw_spin_lock
+ 8.54% 1.39% kworker/u16:11+ [btrfs] [k] __btrfs_run_delayed_refs
+ 7.16% 3.91% kworker/u16:11+ [kernel.vmlinux] [k] kmem_cache_alloc
+ 6.68% 4.37% kworker/u16:11+ [btrfs] [k] join_transaction
+ 6.14% 6.05% kworker/u16:11+ [kernel.vmlinux] [k] _raw_spin_lock_irqsave
+ 4.81% 3.55% kworker/u16:11+ [kernel.vmlinux] [k] kmem_cache_free
+ 4.75% 2.46% kworker/u16:11+ [btrfs] [k] btrfs_get_alloc_profile
+ 4.11% 1.13% kworker/u16:11+ [kernel.vmlinux] [k] ktime_get
+ 3.70% 0.13% swapper [kernel.vmlinux] [k] do_idle
+ 3.69% 0.00% swapper [kernel.vmlinux] [k] secondary_startup_64_no_verify
+ 3.69% 0.00% swapper [kernel.vmlinux] [k] cpu_startup_entry
+ 2.89% 2.81% kworker/u16:11+ [kernel.vmlinux] [k] read_tsc
+ 2.31% 2.23% kworker/u16:11+ [kernel.vmlinux] [k] memset_erms
+ 2.30% 2.21% kworker/u16:11+ [btrfs] [k] btrfs_put_transaction
+ 1.93% 1.76% kworker/u16:11+ [kernel.vmlinux] [k] find_next_bit
+ 1.87% 0.00% swapper [kernel.vmlinux] [k] cpuidle_enter
+ 1.87% 0.04% swapper [kernel.vmlinux] [k] cpuidle_enter_state
+ 1.77% 1.68% kworker/u16:11+ [kernel.vmlinux] [k] _raw_spin_unlock_irqrestore
+ 1.39% 1.36% swapper [kernel.vmlinux] [k] poll_idle
+ 1.39% 1.13% kworker/u16:11+ [kernel.vmlinux] [k] _raw_spin_unlock
+ 1.24% 1.17% kworker/u16:11+ [kernel.vmlinux] [k] slab_free_freelist_hook
+ 1.16% 0.00% swapper [kernel.vmlinux].init.text [k] start_kernel
+ 0.80% 0.74% kworker/u16:11+ [kernel.vmlinux] [k] memcg_slab_post_alloc_hook
+ 0.75% 0.01% swapper [kernel.vmlinux] [k] schedule_idle
+ 0.75% 0.06% swapper [kernel.vmlinux] [k] __schedule
+ 0.70% 0.61% kworker/u16:11+ [btrfs] [k] btrfs_create_pending_block_groups
0.68% 0.43% kworker/u16:11+ [btrfs] [k] btrfs_select_ref_head
+ 0.57% 0.01% swapper [kernel.vmlinux] [k] pick_next_task_fair
0.56% 0.09% swapper [kernel.vmlinux] [k] newidle_balance
```
So this really seems to be a BTFS issue.

My file system is kinda small and kinda full:
```
# btrfs fi usage -T /
Overall:
Device size: 45.41GiB
Device allocated: 45.41GiB
Device unallocated: 1.00MiB
Device missing: 0.00B
Used: 42.82GiB
Free (estimated): 1.92GiB (min: 1.92GiB)
Free (statfs, df): 1.92GiB
Data ratio: 1.00
Metadata ratio: 1.00
Global reserve: 205.20MiB (used: 0.00B)
Multiple profiles: no

Data Metadata System
Id Path single single single Unallocated
-- -------------- -------- -------- -------- -----------
1 /dev/nvme0n1p7 43.12GiB 2.26GiB 32.00MiB 1.00MiB
-- -------------- -------- -------- -------- -----------
Total 43.12GiB 2.26GiB 32.00MiB 1.00MiB
Used 41.20GiB 1.62GiB 16.00KiB

```

[1]: https://askubuntu.com/a/422151
Comment by CodingCellist (CodingCellist) - Tuesday, 27 July 2021, 10:06 GMT
I don't think it's necessarily Btrfs-related: I'm having the same issue on the hardened kernel (5.12.19-hardened1-1-hardened) and not running Btrfs.

Using the same `perf` command, I get:

```
+ 45,84% 0,00% swapper [unknown] [k]
+ 45,84% 0,00% swapper [unknown] [k]
+ 45,08% 0,00% kworker/0:0+kac [unknown] [k]
+ 45,08% 0,00% kworker/0:0+kac [unknown] [k]
+ 44,95% 0,00% swapper [unknown] [k]
+ 44,94% 0,00% swapper [unknown] [k]
+ 44,94% 0,00% kworker/0:0+kac [unknown] [k]
+ 44,84% 0,00% kworker/0:0+kac [unknown] [k]
+ 44,76% 0,00% kworker/0:0+kac [unknown] [k]
+ 43,70% 0,00% kworker/0:0+kac [unknown] [k]
+ 43,42% 0,00% kworker/0:0+kac [unknown] [k]
+ 43,23% 0,00% kworker/0:0+kac [unknown] [k]
+ 43,04% 0,00% kworker/0:0+kac [unknown] [k]
+ 40,90% 0,00% swapper [unknown] [k]
+ 40,81% 0,05% swapper [unknown] [k]
+ 40,48% 0,00% swapper [unknown] [k]
+ 40,40% 0,00% swapper [unknown] [k]
+ 40,36% 0,01% swapper [unknown] [k]
+ 40,36% 0,00% swapper [unknown] [k]
+ 40,35% 0,00% swapper [unknown] [k]
+ 40,34% 0,00% swapper [unknown] [k]
+ 40,32% 0,00% swapper [unknown] [k]
+ 39,91% 0,00% swapper [unknown] [k]
+ 39,74% 0,00% swapper [unknown] [k]
+ 39,36% 0,00% swapper [unknown] [k]
+ 39,32% 0,16% swapper [unknown] [k]
+ 39,16% 39,16% swapper [unknown] [k]
+ 37,85% 0,01% kworker/0:0+kac [unknown] [k]
+ 32,28% 0,00% kworker/0:0+kac [unknown] [k]
+ 32,22% 0,00% kworker/0:0+kac [unknown] [k]
+ 32,15% 0,00% kworker/0:0+kac [unknown] [k]
+ 24,92% 0,00% kworker/0:0+kac [unknown] [k]
+ 24,38% 0,03% kworker/0:0+kac [unknown] [k]
+ 24,36% 24,35% kworker/0:0+kac [unknown] [k]
+ 23,25% 0,00% kworker/0:0+kac [unknown] [k]
+ 23,09% 0,00% kworker/0:0+kac [unknown] [k]
+ 23,01% 0,00% kworker/0:0+kac [unknown] [k]
+ 22,98% 0,00% kworker/0:0+kac [unknown] [k]
+ 22,85% 0,00% kworker/0:0+kac [unknown] [k]
+ 19,89% 0,00% swapper [unknown] [k]
+ 19,53% 0,00% swapper [unknown] [k]
+ 15,66% 0,00% kworker/0:0+kac [unknown] [k]
+ 10,49% 0,00% kworker/0:0+kac [unknown] [k]
+ 9,67% 0,00% kworker/0:0+kac [unknown] [k]
+ 9,64% 0,00% kworker/0:0+kac [unknown] [k]
+ 9,56% 0,00% kworker/0:0+kac [unknown] [k]
+ 9,48% 0,00% kworker/0:0+kac [unknown] [k]
+ 7,17% 0,00% kworker/0:0+kac [unknown] [k]
+ 7,11% 0,00% kworker/0:0+kac [unknown] [k]
+ 6,84% 0,01% kworker/0:0+kac [unknown] [k]
+ 6,83% 6,83% kworker/0:0+kac [unknown] [k]
```

I'm not familiar with `perf`, so if there is some way to get more useful output (e.g. removing all the "unknown"s) or other things to investigate, please let me know.
Comment by CodingCellist (CodingCellist) - Tuesday, 27 July 2021, 11:17 GMT
A bit more info:

1) According to `top`, the specific processes are `kworker/0:1-kacpi_notify` and `kworker/0:1+kacpid`.

2) A quick internet search reveals a couple of similar threads, although some seem old (~7-9 years ago). From most recent to oldest:
- https://forum.manjaro.org/t/kworker-kacpid-over-70-of-cpu-dual-boot-mac-manjaro/61981
- https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1878919
- https://bbs.archlinux.org/viewtopic.php?id=184913

3) Using `grep . -r /sys/firmware/acpi/interrupts/` on my machine reveals `gpe69` and `gpe16` as the highest, with 43687564 and 51511 calls(?) respectively. Based on the threads in (2), it seems that a workaround is to disable those ACPI interrupts, but since that doesn't seem to solve whatever the underlying problem may be, I'm personally not too keen on that.

4) In case it's useful, here's the output from `journalctl -b 0 -p 5 -g acpi:
```
-- Journal begins at Thu 2021-03-25 16:57:06 CET, ends at Tue 2021-07-27 12:44:21 CEST. --
Jul 27 09:03:58 archlinux kernel: ACPI: [Firmware Bug]: BIOS _OSI(Linux) query ignored
Jul 27 09:03:58 archlinux kernel: ACPI Error: AE_NOT_FOUND, While resolving a named reference package element - TUID (20210105/dspkginit-438)
Jul 27 09:03:58 archlinux kernel: hpet_acpi_add: no address or irqs in _CRS
Jul 27 09:04:15 skidbladnir kernel: acpi PNP0C14:02: duplicate WMI GUID 05901221-D566-11D1-B2F0-00A0C9062910 (first instance was on PNP0C14:01)
Jul 27 09:04:15 skidbladnir kernel: acpi PNP0C14:03: duplicate WMI GUID 05901221-D566-11D1-B2F0-00A0C9062910 (first instance was on PNP0C14:01)
Jul 27 09:04:15 skidbladnir kernel: acpi PNP0C14:04: duplicate WMI GUID 05901221-D566-11D1-B2F0-00A0C9062910 (first instance was on PNP0C14:01)
Jul 27 09:04:15 skidbladnir kernel: acpi PNP0C14:05: duplicate WMI GUID 05901221-D566-11D1-B2F0-00A0C9062910 (first instance was on PNP0C14:01)
Jul 27 09:04:15 skidbladnir kernel: thinkpad_acpi: Disabling thinkpad-acpi brightness events by default...
Jul 27 09:04:16 skidbladnir kernel: ACPI Warning: \_SB.PCI0.GFX0._DSM: Argument #4 type mismatch - Found [Buffer], ACPI requires [Package] (20210105/nsarguments-61)
Jul 27 09:04:16 skidbladnir kernel: ACPI Warning: \_SB.PCI0.PEG0.PEGP._DSM: Argument #4 type mismatch - Found [Buffer], ACPI requires [Package] (20210105/nsarguments-61)
```
There definitely seems to be something up, but I don't know if it's the kernel's or the BIOS's fault...

My setup is:
- HW: ThinkPad X1 Extreme Gen 1
- CPU: Intel Core i7-8750H
- BIOS: N2EET53W (1.35)
- Kernel: 5.12.19-hardened1-1-hardened
- systemd: 249.2-1
Comment by agapito fernandez (agapito) - Saturday, 31 July 2021, 12:39 GMT
Every time i update my system using pacman, this bug appears. It's really annoying.

Overall:
Device size: 60.00GiB
Device allocated: 58.97GiB
Device unallocated: 1.03GiB
Device missing: 0.00B
Used: 46.27GiB
Free (estimated): 11.78GiB (min: 11.78GiB)
Free (statfs, df): 11.78GiB
Data ratio: 1.00
Metadata ratio: 1.00
Global reserve: 512.00MiB (used: 0.00B)
Multiple profiles: no

Data Metadata System
Id Path single single single Unallocated
-- -------------- -------- -------- -------- -----------
1 /dev/nvme0n1p2 55.96GiB 3.00GiB 12.00MiB 1.03GiB
-- -------------- -------- -------- -------- -----------
Total 55.96GiB 3.00GiB 12.00MiB 1.03GiB
Used 45.21GiB 1.06GiB 64.00KiB
Comment by mattia (nTia89) - Friday, 11 March 2022, 16:01 GMT
Upstream bug has been fixed.

Do you still suffer of this issue?

Loading...