FS#75138 - Suspend can hang if ath9k wireless interface is up but un-associated
Attached to Project:
Arch Linux
Opened by Greg Erwin (Odysseus) - Wednesday, 22 June 2022, 04:08 GMT
Last edited by Toolybird (Toolybird) - Sunday, 13 November 2022, 05:33 GMT
Opened by Greg Erwin (Odysseus) - Wednesday, 22 June 2022, 04:08 GMT
Last edited by Toolybird (Toolybird) - Sunday, 13 November 2022, 05:33 GMT
|
Details
Suspend can hang if ath9k wireless interface is up but
un-associated
Description: Starting from kernel version 5.18.arch1-1 and onwards, I typically experience long delays in the suspend process. It can be anywhere from a few seconds to upwards of two minutes. Additional info: After including 'initcall_debug' to my kernel cmdline, I see the following sequence of journal messages when the hang occurs: [ 1290.501909] ieee80211 phy0: PM: calling wiphy_suspend+0x0/0x2b0 [cfg80211] @ 86, parent: 0000:06:00.0 ... [ 1294.038295] queueing ieee80211 work while going to suspend [ 1348.011752] hwrng: no data available [ 1358.043585] ieee80211 phy0: PM: wiphy_suspend+0x0/0x2b0 [cfg80211] returned 0 after 67541642 usecs Additionally, the 'hwrng: no data available' message appears in the log sporadically at other times wile the PC is running normally. In particular, the suspend hang will not occur until the first time the message is reported in the log. In contrast, a normal suspend process on kernel version 5.17.9.arch1-1 results in this sequence: [ 28.654797] ieee80211 phy0: PM: calling wiphy_suspend+0x0/0x2b0 [cfg80211] @ 86, parent: 0000:06:00.0 ... [ 28.663307] ieee80211 phy0: PM: wiphy_suspend+0x0/0x2b0 [cfg80211] returned 0 after 8476 usecs With effort, I bisected the issue down to two linux kernel commits that are both necessary to trigger the suspend hang: [fcd09c90c3c5254b18ef34e30c57c65d34290a84] ath9k: use hw_random API instead of directly dumping into random.c [c321e907aa4803d562d6e70ebed9444ad082f953] random: continually use hwgenerator randomness I have a 'Qualcomm Atheros AR93xx Wireless Network Adapter' installed which is typically not in use and remains un-associated with any AP. While the interface is up but un-associated, the hardware random number generator provided by the ath9k driver will block and not yield any data. It is not until I start the wpa_supplicant service that 'dd if=/dev/hwrng count=1' will unblock and return data. While connected to wifi, I never see the 'hwrng: no data available' message or experience the suspend hang. Similarly, if I explicitly set 'ip link wlp6s0 down' then /sys/class/misc/hw_random/rng_available is empty, and instead of blocking, 'dd if=/devv/hwrng count=1' returns 'dd: error reading '/dev/hwrng': No such device'. While the interface is down, I do not get any "hwrng: no data available" log messages or suspend hangs. I am guessing that it is this available but blocking behavior of the ath9k hardware number generator that is problematic. The ath9k driver is the only provider of a hwrng in my system. I would assume that having an additional hwrng available, like a TPM chip or a modern CPU would prevent this issue from triggering. Steps to reproduce: 0) Have a wireless network adapter that uses the ath9k driver. 1) Install a kernel that includes the two commits called out above, such as 5.18.arch1-1 or later. 2) Include 'initcall_debug' in your kernel cmdline. 3) Disconnect the wireless interface, but leave it up. 4) Look in /sys/class/misc/hw_random/rng_available to blacklist or rmmod any drivers that provide a hwrng other than ath9k. 5) Wait for "demsg -w | grep hwrng" to show "hwrng: no data available". 6) Suspend. 7) The system will go unresponsive for a long period of time before finally entering the suspended state. 8) Resume. 9) Confirm the hang with 'dmesg | grep wiphy_suspend' which will show it returned 0 after a few million usecs. |
This task depends upon
Closed by Toolybird (Toolybird)
Sunday, 13 November 2022, 05:33 GMT
Reason for closing: Fixed
Additional comments about closing: Reporter says "Patched upstream."
Sunday, 13 November 2022, 05:33 GMT
Reason for closing: Fixed
Additional comments about closing: Reporter says "Patched upstream."
See also
FS#74899Have you contacted the upstream kernel developers?
Interesting how this seems to manifest differently for others.
I can't recall ever seeing my system hang except when entering suspend.
ath9k-devel <ath9k-devel@qca.qualcomm.com> "linux-wireless@vger.kernel.org" <linux-wireless@vger.kernel.org> "Linux Crypto Mailing List" <linux-crypto@vger.kernel.org>
Edit:
rereading https://lore.kernel.org/all/YgykjbhgdqeYyiY5%40owl.dominikbrodowski.net/ this analysis seems to match the issue.
Edit2:
That was on v2 and was addressed in v3.
By the way, I went back and spent more time with a kernel that only includes the "ath9k: use hw_random API" commit, and was eventually able to reproduce the suspend hang. In this case, it took an hour for the first "hwrng: no data available" message to appear. I think the "random: continually use hwgenerator randomness" commit simply makes the issue more probable to occur.
> I'll give it a shot
Any luck?
It looks like version 12 of the patch started by Jason Donenfeld and modified by Herbert Xu will get picked up.
I think these are all the relevant threads on LKML:
https://bit.ly/3PZmijo (My initial post, Jason's patches v1-v6)
https://bit.ly/3zSG4aD (Jason's patches v7-v10)
https://bit.ly/3JsaT97 (Herbert Xu's additions v11-v12)
[1] https://gitlab.com/knurd42/regzbot/-/blob/main/docs/getting_started.md
[1] https://git.kernel.org/pub/scm/linux/kernel/git/herbert/cryptodev-2.6.git/commit/?id=36cb6494429bd64b27b7ff8b4af56f8e526da2b4
[1] https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=v5.19.17&id=a54676340b0e45e84f2d21df2e71b3b7888fde89
[2] https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=v6.0.3&id=320f2dce198b2db9abe4269e2df9b19852951c38