FS#60899 - [linux] brcmfmac null pointer exception in wifi driver

Attached to Project: Arch Linux
Opened by mwarning (mwarning) - Friday, 23 November 2018, 18:15 GMT
Last edited by Andreas Radke (AndyRTR) - Tuesday, 01 March 2022, 21:10 GMT
Task Type Bug Report
Category Kernel
Status Closed
Assigned To Tobias Powalowski (tpowa)
Jan Alexander Steffens (heftig)
Architecture x86_64
Severity Medium
Priority Normal
Reported Version
Due in Version Undecided
Due Date Undecided
Percent Complete 100%
Votes 4
Private No

Details

Description: wifi driver does not work anymore. Sometimes crashes with null pointer exception.

Additional info:
* 4.19.2-arch1-1-ARCH
* Hardware: Dell Inc. XPS 13 9350
* Kernel log error: brcmfmac: brcmf_msgbuf_tx_ioctl: Failed to reserve space in commonring


Steps to reproduce:
1. Use WiFi
2. at some point the connection will break (brcmf_msgbuf_tx_ioctl and other error messages in kernel log)
3. sudo modprobe -r brcmfmac && sudo modprobe brcmfmac
4. after some times of using step 3., the kernel module crashes
This task depends upon

Closed by  Andreas Radke (AndyRTR)
Tuesday, 01 March 2022, 21:10 GMT
Reason for closing:  Fixed
Additional comments about closing:  Fixed upstream.
Comment by loqs (loqs) - Saturday, 24 November 2018, 13:26 GMT
Please try 4.19.4.arch1-1 if that still has the issue then try 4.20-rc3 if that still has the issue then please bisect between 4.18 and 4.19 and report the result upstream.
Comment by mwarning (mwarning) - Monday, 26 November 2018, 16:00 GMT
What helped so far as a temporary measure was to reload the wifi module:
```
sudo modprobe -r brcmfmac
sudo modprobe brcmfmac
```
Comment by mwarning (mwarning) - Wednesday, 28 November 2018, 23:59 GMT
Same problem with 4.19.4-arch1-1-ARCH. Kernel log attached.
Comment by mwarning (mwarning) - Saturday, 01 December 2018, 01:28 GMT
I tried to compile brcmfmac myself but it failed:

git clone git://git.archlinux.org/linux.git
cd linux
make mrproper
git checkout v4.19.4-arch1
cp /usr/lib/modules/$(uname -r)/build/.config ./
cp /usr/lib/modules/$(uname -r)/build/Module.symvers ./
make EXTRAVERSION=-arch1 modules_prepare
make M=drivers/net/wireless/broadcom/brcm80211/brcmfmac
CC [M] drivers/net/wireless/broadcom/brcm80211/brcmfmac/cfg80211.o
CC [M] drivers/net/wireless/broadcom/brcm80211/brcmfmac/chip.o
CC [M] drivers/net/wireless/broadcom/brcm80211/brcmfmac/fwil.o
CC [M] drivers/net/wireless/broadcom/brcm80211/brcmfmac/fweh.o
CC [M] drivers/net/wireless/broadcom/brcm80211/brcmfmac/p2p.o
CC [M] drivers/net/wireless/broadcom/brcm80211/brcmfmac/proto.o
CC [M] drivers/net/wireless/broadcom/brcm80211/brcmfmac/common.o
CC [M] drivers/net/wireless/broadcom/brcm80211/brcmfmac/core.o
CC [M] drivers/net/wireless/broadcom/brcm80211/brcmfmac/firmware.o
CC [M] drivers/net/wireless/broadcom/brcm80211/brcmfmac/feature.o
CC [M] drivers/net/wireless/broadcom/brcm80211/brcmfmac/btcoex.o
CC [M] drivers/net/wireless/broadcom/brcm80211/brcmfmac/vendor.o
CC [M] drivers/net/wireless/broadcom/brcm80211/brcmfmac/pno.o
CC [M] drivers/net/wireless/broadcom/brcm80211/brcmfmac/bcdc.o
CC [M] drivers/net/wireless/broadcom/brcm80211/brcmfmac/fwsignal.o
CC [M] drivers/net/wireless/broadcom/brcm80211/brcmfmac/commonring.o
CC [M] drivers/net/wireless/broadcom/brcm80211/brcmfmac/flowring.o
CC [M] drivers/net/wireless/broadcom/brcm80211/brcmfmac/msgbuf.o
CC [M] drivers/net/wireless/broadcom/brcm80211/brcmfmac/sdio.o
CC [M] drivers/net/wireless/broadcom/brcm80211/brcmfmac/bcmsdh.o
CC [M] drivers/net/wireless/broadcom/brcm80211/brcmfmac/usb.o
CC [M] drivers/net/wireless/broadcom/brcm80211/brcmfmac/pcie.o
CC [M] drivers/net/wireless/broadcom/brcm80211/brcmfmac/debug.o
drivers/net/wireless/broadcom/brcm80211/brcmfmac/debug.c:28:5: error: redefinition of ‘brcmf_debug_create_memdump’
int brcmf_debug_create_memdump(struct brcmf_bus *bus, const void *data,
^~~~~~~~~~~~~~~~~~~~~~~~~~
In file included from drivers/net/wireless/broadcom/brcm80211/brcmfmac/bus.h:20,
from drivers/net/wireless/broadcom/brcm80211/brcmfmac/debug.c:24:
drivers/net/wireless/broadcom/brcm80211/brcmfmac/debug.h:129:5: note: previous definition of ‘brcmf_debug_create_memdump’ was here
int brcmf_debug_create_memdump(struct brcmf_bus *bus, const void *data,
^~~~~~~~~~~~~~~~~~~~~~~~~~
drivers/net/wireless/broadcom/brcm80211/brcmfmac/debug.c:61:5: error: redefinition of ‘brcmf_debugfs_add_entry’
int brcmf_debugfs_add_entry(struct brcmf_pub *drvr, const char *fn,
^~~~~~~~~~~~~~~~~~~~~~~
In file included from drivers/net/wireless/broadcom/brcm80211/brcmfmac/bus.h:20,
from drivers/net/wireless/broadcom/brcm80211/brcmfmac/debug.c:24:
drivers/net/wireless/broadcom/brcm80211/brcmfmac/debug.h:123:5: note: previous definition of ‘brcmf_debugfs_add_entry’ was here
int brcmf_debugfs_add_entry(struct brcmf_pub *drvr, const char *fn,
^~~~~~~~~~~~~~~~~~~~~~~
make[1]: *** [scripts/Makefile.build:306: drivers/net/wireless/broadcom/brcm80211/brcmfmac/debug.o] Error 1
make: *** [Makefile:1517: _module_drivers/net/wireless/broadcom/brcm80211/brcmfmac] Error 2
Comment by wes (Nisstyre56) - Wednesday, 12 December 2018, 02:34 GMT
Hey, I also have this issue (same model, XPS 13 9350).

Looks like there is an upstream bug report: https://bugzilla.kernel.org/show_bug.cgi?id=201853

Also, I am running 4.19.8 and I still have the issue.

I see some messages related to brcmfmac at boot as well

```
[ 12.763094] usbcore: registered new interface driver brcmfmac
[ 12.763168] brcmfmac 0000:3a:00.0: enabling device (0000 -> 0002)
[ 12.869720] brcmfmac: brcmf_fw_alloc_request: using brcm/brcmfmac4350-pcie for chip BCM4350/8
[ 12.875052] brcmfmac 0000:3a:00.0: Direct firmware load for brcm/brcmfmac4350-pcie.txt failed with error -2
[ 13.301077] brcmfmac: brcmf_fw_alloc_request: using brcm/brcmfmac4350-pcie for chip BCM4350/8
[ 13.302049] brcmfmac 0000:3a:00.0: Direct firmware load for brcm/brcmfmac4350-pcie.clm_blob failed with error -2
[ 13.302051] brcmfmac: brcmf_c_process_clm_blob: no clm_blob available (err=-2), device may have limited channels available
[ 13.303678] brcmfmac: brcmf_c_preinit_dcmds: Firmware: BCM4350/8 wl0: Oct 22 2015 06:16:26 version 7.35.180.119 (r594535) FWID 01-e791c176
[ 13.328643] brcmfmac 0000:3a:00.0 wlp58s0: renamed from wlan0
```

The errors don't prevent wifi from working (at first), but maybe this could be related to the issue.

I will attempt to find a way to reproduce it consistently.
Comment by wes (Nisstyre56) - Wednesday, 12 December 2018, 03:37 GMT
I'm checking to see if it happens with power management turned off

e.g.,

sudo iw dev wlp58s0 set power_save off

I will report back if this fixes the issue
Comment by mwarning (mwarning) - Wednesday, 12 December 2018, 09:48 GMT
I have been able to compile the module to reproduce the issue (by editing some code). But do not quite understand the commonring implementation.
Do you know how I can properly define DEBUG to enable the debug output of the module?
Comment by wes (Nisstyre56) - Wednesday, 12 December 2018, 15:05 GMT
It seems like it works fine with power management turned off, so that may be (at least part of) the culprit here

Please try `sudo iw dev wlp58s0 set power_save off` and report back if you still get the issue.
Comment by wes (Nisstyre56) - Wednesday, 12 December 2018, 16:25 GMT
@mwarning check out this and let me know if that works https://stackoverflow.com/a/50522223/903589
Comment by loqs (loqs) - Wednesday, 12 December 2018, 16:32 GMT
@mwarning the brcmfmac.debug module option (bitmask) has no effect?
Comment by Kunda (Kunda) - Monday, 11 February 2019, 10:51 GMT
any progress on this?
Comment by mwarning (mwarning) - Monday, 11 February 2019, 12:11 GMT
There is no progress so far. I am too busy right now (sorry about that). I also switched to Void Linux. The exact same problem is there as well (4.19.13_1).
Comment by Kunda (Kunda) - Friday, 22 February 2019, 12:53 GMT
any progress on this?
Comment by wes (Nisstyre56) - Friday, 22 February 2019, 16:39 GMT
@Kunda

Please see the upstream bug report https://bugzilla.kernel.org/show_bug.cgi?id=201853

It looks like there is still no fix, however I've found a workaround involving disabling power management on the chipset https://bbs.archlinux.org/viewtopic.php?id=242382

If you're able to, it would be helpful to try and do a git bisect of the kernel (see the bugzilla discussion) in order to identify which commit introduced the problem. I'm guessing the most reliable way to reproduce this would be to bring your laptop out of suspend mode and then wait a couple hours.
Comment by mattia (nTia89) - Monday, 28 February 2022, 16:17 GMT
The upstream bug has been fixed and I cannot reproduce the issue anymore.

Is it still an issue for you?
Comment by Kunda (Kunda) - Monday, 28 February 2022, 21:05 GMT
Forgot about this bug. Haven't experienced this issue in a long time.

I'm using Kernel: 5.15.24-1-rt31-MANJARO x86_64

Loading...