FS#63825 - [linux] iwlwifi crash/hang on kernel 5.2

Attached to Project: Arch Linux
Opened by Chris Billington (chrisjbillington) - Wednesday, 18 September 2019, 15:25 GMT
Last edited by Allan McRae (Allan) - Sunday, 01 March 2020, 07:33 GMT
Task Type Bug Report
Category Packages: Core
Status Closed
Assigned To Tobias Powalowski (tpowa)
Jan Alexander Steffens (heftig)
Architecture x86_64
Severity Low
Priority Normal
Reported Version
Due in Version Undecided
Due Date Undecided
Percent Complete 100%
Votes 0
Private No

Details

For the last couple of weeks, on the regular Arch kernel (presently 5.2.14.arch2-1) I have been getting regular wifi crashes/hangs. My wifi drops out, and the mouse (laptop trackpad) on my system freezes (I am on Xorg so this is not a wayland issue). Eventually (~10 seconds) wifi returns, though about half the time, my mouse remains frozen (other times it unfreezes and my system apparently returns to normal). Keyboard works in either case however and I can continue to use my machine via the keyboard, everything else including wifi seems functional.

I am on a Dell precision 5520 which has an intel wifi card listed by lspci as:

02:00.0 Network controller: Intel Corporation Wireless 8265 / 8275 (rev 78)

The hang/crash only occurs on a specific wireless network at one of my workplaces - it does not occur on my home network or at another workplace. It seems to perhaps correlate with network activity, that is, the hang usually occurs immediately upon trying to load a webpage or something else that would create network activity. But the hangs are rare enough that this could be a spurious association I'm making, not very scientific.

The hang does does not occur if I boot the LTS kernel (presently 4.19.73-1-lts).

Steps to reproduce for me are to boot the regular kernel, connect to my work wifi and use internet normally for about an hour.

Dmesg output attached (dmesg.log) - it should contain exactly one hang/crash right at the end. This is the full output of mesg, but doesn't seem to go all the way back to the last reboot, presumably because I hadn't rebooted for several days.

After rebooting into the LTS kernel I see similar output in dmesg (attached: dmesg-lts.log), but I do not notice any hangs or wifi dropouts.

When running the LTS kernel, dmesg output contains a line:

iwlwifi 0000:02:00.0: Hardware error detected. Restarting.

So it is possible the driver knows what's up and that it is indeed a hardware failure, though the fact that behaviour differs between LTS and latest kernels makes me suspect it might be a driver issue and not a hardware issue (or perhaps a regression in how the driver handles a hardware issue).

If nobody else confirms they are experiencing this issue then I will conclude it's likely my hardware.

I will continue to experiment and try to bisect which kernel first has the issue, but it is slow to debug since the issue only occurs a few times a day.
This task depends upon

Closed by  Allan McRae (Allan)
Sunday, 01 March 2020, 07:33 GMT
Reason for closing:  None
Additional comments about closing:  User replaced hardware.
Comment by Chris Billington (chrisjbillington) - Wednesday, 18 September 2019, 18:36 GMT
Googling around, there seem to be a number of similar (but not identical) bugs reported against 5.2, supposedly fixed in 5.3. I am running 5.3 for the moment and will request closure of this task if the problem doesn't appear again within the next day or two.
Comment by Chris Billington (chrisjbillington) - Wednesday, 18 September 2019, 18:46 GMT
Issue still occurs with 5.3. My mouse didn't freeze this time, though the system was otherwise unresponsive during the hang. Same dmesg output as before.
Comment by Chris Billington (chrisjbillington) - Thursday, 19 September 2019, 20:31 GMT
The problem does not occur with kernel 5.1.16 - well, I see the dmesg errors, but my system does not hang, same as with the LTS.
Comment by Chris Billington (chrisjbillington) - Friday, 20 September 2019, 16:38 GMT
Finished bisecting among Arch packages, the first version on which the problem occurs is 5.2.arch2-1.
Comment by Chris Billington (chrisjbillington) - Saturday, 22 February 2020, 20:58 GMT
FWIW I am still experiencing the issue on 5.5.

At one location at my workplace, I can connect to wifi on 5.4 LTS but not 5.5.
At another location at my workplace, I can connect, but get the intermittent freezing with the mouse hanging.
At home, my wifi is fine.

So it's clearly dependent on what access point I'm on. Home is 2.4GHz, work has both 2.4 and 5GHz with the same SSID and the connection seems to switch between the two depending on which is better.

Can provide more dmesg output once it occurs again if that would be useful.
Comment by Chris Billington (chrisjbillington) - Wednesday, 26 February 2020, 16:05 GMT
Dmesg output on kernel 5.5.6.arch1-1 during a freeze - networking dropped out and mouse froze for several seconds, then networking returned but mouse remained frozen (keyboard continued to work).

Again, it says "HW error" - so maybe I should take that seriously and just get a new card...

```
[ 1155.918455] iwlwifi 0000:02:00.0: Error sending SCAN_CFG_CMD: time out after 2000ms.
[ 1155.918460] iwlwifi 0000:02:00.0: Current CMD queue read_ptr 96 write_ptr 97
[ 1156.169355] iwlwifi 0000:02:00.0: HW error, resetting before reading
[ 1156.175786] iwlwifi 0000:02:00.0: Start IWL Error Log Dump:
[ 1156.175788] iwlwifi 0000:02:00.0: Status: 0x00000040, count: 1448630561
[ 1156.175789] iwlwifi 0000:02:00.0: Loaded firmware version: 36.952d9faa.0
[ 1156.175790] iwlwifi 0000:02:00.0: 0x1EFDDC59 | ADVANCED_SYSASSERT
[ 1156.175790] iwlwifi 0000:02:00.0: 0x99F7CBBF | trm_hw_status0
[ 1156.175791] iwlwifi 0000:02:00.0: 0xA0B05C0F | trm_hw_status1
[ 1156.175791] iwlwifi 0000:02:00.0: 0x394BBEA0 | branchlink2
[ 1156.175792] iwlwifi 0000:02:00.0: 0xB9D31597 | interruptlink1
[ 1156.175792] iwlwifi 0000:02:00.0: 0x61A1D5EC | interruptlink2
[ 1156.175793] iwlwifi 0000:02:00.0: 0x3E774EF7 | data1
[ 1156.175793] iwlwifi 0000:02:00.0: 0x76A17898 | data2
[ 1156.175794] iwlwifi 0000:02:00.0: 0xD772FB1E | data3
[ 1156.175794] iwlwifi 0000:02:00.0: 0x6643351A | beacon time
[ 1156.175795] iwlwifi 0000:02:00.0: 0x76E08BBA | tsf low
[ 1156.175795] iwlwifi 0000:02:00.0: 0xA20792F1 | tsf hi
[ 1156.175796] iwlwifi 0000:02:00.0: 0xB0CC5EF5 | time gp1
[ 1156.175796] iwlwifi 0000:02:00.0: 0xAA0F41FD | time gp2
[ 1156.175797] iwlwifi 0000:02:00.0: 0xB300BFE9 | uCode revision type
[ 1156.175798] iwlwifi 0000:02:00.0: 0x4C66FC58 | uCode version major
[ 1156.175798] iwlwifi 0000:02:00.0: 0x46A772EF | uCode version minor
[ 1156.175799] iwlwifi 0000:02:00.0: 0xC6FCFD54 | hw version
[ 1156.175799] iwlwifi 0000:02:00.0: 0x4D5E5603 | board version
[ 1156.175800] iwlwifi 0000:02:00.0: 0x78E813F6 | hcmd
[ 1156.175800] iwlwifi 0000:02:00.0: 0xFE9F36AA | isr0
[ 1156.175801] iwlwifi 0000:02:00.0: 0x516EAEEB | isr1
[ 1156.175801] iwlwifi 0000:02:00.0: 0xB885A302 | isr2
[ 1156.175802] iwlwifi 0000:02:00.0: 0x5917BDD0 | isr3
[ 1156.175802] iwlwifi 0000:02:00.0: 0xC1822268 | isr4
[ 1156.175803] iwlwifi 0000:02:00.0: 0xD5613A22 | last cmd Id
[ 1156.175803] iwlwifi 0000:02:00.0: 0x64CDE95F | wait_event
[ 1156.175804] iwlwifi 0000:02:00.0: 0x99D9E3CC | l2p_control
[ 1156.175804] iwlwifi 0000:02:00.0: 0x26FEAD1C | l2p_duration
[ 1156.175805] iwlwifi 0000:02:00.0: 0xE897107B | l2p_mhvalid
[ 1156.175806] iwlwifi 0000:02:00.0: 0xCE2A11B4 | l2p_addr_match
[ 1156.175806] iwlwifi 0000:02:00.0: 0xCFCA64D6 | lmpm_pmg_sel
[ 1156.175807] iwlwifi 0000:02:00.0: 0x3F14F79B | timestamp
[ 1156.175807] iwlwifi 0000:02:00.0: 0x0EE02DF3 | flow_handler
[ 1156.175893] iwlwifi 0000:02:00.0: Start IWL Error Log Dump:
[ 1156.175894] iwlwifi 0000:02:00.0: Status: 0x00000040, count: 1995997065
[ 1156.175894] iwlwifi 0000:02:00.0: 0x541B3490 | ADVANCED_SYSASSERT
[ 1156.175895] iwlwifi 0000:02:00.0: 0x88525B6D | umac branchlink1
[ 1156.175895] iwlwifi 0000:02:00.0: 0x527871D7 | umac branchlink2
[ 1156.175896] iwlwifi 0000:02:00.0: 0xCFDBA23C | umac interruptlink1
[ 1156.175896] iwlwifi 0000:02:00.0: 0xAE43CBF4 | umac interruptlink2
[ 1156.175897] iwlwifi 0000:02:00.0: 0xD7D4F749 | umac data1
[ 1156.175897] iwlwifi 0000:02:00.0: 0x6BFFB637 | umac data2
[ 1156.175898] iwlwifi 0000:02:00.0: 0x90EFA671 | umac data3
[ 1156.175899] iwlwifi 0000:02:00.0: 0xE277D891 | umac major
[ 1156.175899] iwlwifi 0000:02:00.0: 0xB8C21E48 | umac minor
[ 1156.175900] iwlwifi 0000:02:00.0: 0xD299D8C7 | frame pointer
[ 1156.175900] iwlwifi 0000:02:00.0: 0xF2D338A5 | stack pointer
[ 1156.175901] iwlwifi 0000:02:00.0: 0x00B7EB64 | last host cmd
[ 1156.175901] iwlwifi 0000:02:00.0: 0xB99259DD | isr status reg
[ 1156.175920] iwlwifi 0000:02:00.0: Fseq Registers:
[ 1156.175958] iwlwifi 0000:02:00.0: 0xA5A5A5A2 | FSEQ_ERROR_CODE
[ 1156.176028] iwlwifi 0000:02:00.0: 0xA5A5A5A2 | FSEQ_TOP_INIT_VERSION
[ 1156.176098] iwlwifi 0000:02:00.0: 0xA5A5A5A2 | FSEQ_CNVIO_INIT_VERSION
[ 1156.176169] iwlwifi 0000:02:00.0: 0xA5A5A5A2 | FSEQ_OTP_VERSION
[ 1156.176239] iwlwifi 0000:02:00.0: 0xA5A5A5A2 | FSEQ_TOP_CONTENT_VERSION
[ 1156.176309] iwlwifi 0000:02:00.0: 0xA5A5A5A2 | FSEQ_ALIVE_TOKEN
[ 1156.176379] iwlwifi 0000:02:00.0: 0xA5A5A5A2 | FSEQ_CNVI_ID
[ 1156.176449] iwlwifi 0000:02:00.0: 0xA5A5A5A2 | FSEQ_CNVR_ID
[ 1156.176519] iwlwifi 0000:02:00.0: 0xA5A5A5A2 | CNVI_AUX_MISC_CHIP
[ 1156.176589] iwlwifi 0000:02:00.0: 0xA5A5A5A2 | CNVR_AUX_MISC_CHIP
[ 1156.176660] iwlwifi 0000:02:00.0: 0xA5A5A5A2 | CNVR_SCU_SD_REGS_SD_REG_DIG_DCDC_VTRIM
[ 1156.176730] iwlwifi 0000:02:00.0: 0xA5A5A5A2 | CNVR_SCU_SD_REGS_SD_REG_ACTIVE_VDIG_MIRROR
[ 1156.176735] iwlwifi 0000:02:00.0: Collecting data: trigger 2 fired.
[ 1156.176737] ieee80211 phy0: Hardware restart was requested
[ 1157.566902] iwlwifi 0000:02:00.0: Queue 0 is inactive on fifo 2 and stuck for 2500 ms. SW [96, 97] HW [162, 162] FH TRB=0x0a5a5a5a2
[ 1163.596703] i2c_designware i2c_designware.1: controller timed out
[ 1163.657691] iwlwifi 0000:02:00.0: Queue 4 is inactive on fifo 2 and stuck for 10000 ms. SW [110, 127] HW [162, 162] FH TRB=0x0a5a5a5a2
[ 1163.725218] iwlwifi 0000:02:00.0: Failing on timeout while stopping DMA channel 8 [0xa5a5a5a2]
[ 1163.742854] iwlwifi 0000:02:00.0: Applying debug destination EXTERNAL_DRAM
[ 1163.749017] i2c_designware i2c_designware.1: timeout in disabling adapter
[ 1163.874390] iwlwifi 0000:02:00.0: Applying debug destination EXTERNAL_DRAM
[ 1163.939103] iwlwifi 0000:02:00.0: FW already configured (0) - re-configuring
[ 1165.029655] i2c_designware i2c_designware.1: timeout in disabling adapter
```
Comment by loqs (loqs) - Wednesday, 26 February 2020, 17:59 GMT
I would suggest following https://wireless.wiki.kernel.org/en/users/drivers/iwlwifi/debugging see if upstream believes it is hardware failure.
You could also try bisecting between 5.1 and 5.2 to try and locate the causal commit.
Comment by Chris Billington (chrisjbillington) - Sunday, 01 March 2020, 02:47 GMT
For twenty bucks and twenty minutes I fixed the issue for me by getting a new wifi card, so I won't be following this up.

Loading...