FS#72231 - mlx5_core module no longer works with Connectx4-LX (LTS Kernel 5.10.68-1-lts)
Attached to Project:
Arch Linux
Opened by Michael Brock (hrast) - Friday, 24 September 2021, 22:26 GMT
Last edited by Andreas Radke (AndyRTR) - Wednesday, 29 September 2021, 05:07 GMT
Opened by Michael Brock (hrast) - Friday, 24 September 2021, 22:26 GMT
Last edited by Andreas Radke (AndyRTR) - Wednesday, 29 September 2021, 05:07 GMT
|
Details
Description:
Somewhere between 5.10.61 and 5.10.68, mlx5_core changed apparently. After updating to current LTS kernel (5.10.68-1-lts), driver no longer loads correctly: # dmesg | grep mlx [ 18.001398] mlx5_core 0000:02:00.0: firmware version: 14.30.1004 [ 18.001430] mlx5_core 0000:02:00.0: 63.008 Gb/s available PCIe bandwidth (8.0 GT/s PCIe x8 link) [ 18.219630] mlx5_core 0000:02:00.0: E-Switch: Total vports 10, per vport: max uc(1024) max mc(16384) [ 18.222305] mlx5_core 0000:02:00.0: Port module event: module 0, Cable unplugged [ 20.335506] mlx5_core 0000:02:00.0: E-Switch: cleanup [ 21.058695] mlx5_core 0000:02:00.0: init_one:1371:(pid 306): mlx5_load_one failed with error code -22 [ 21.059022] mlx5_core: probe of 0000:02:00.0 failed with error -22 [ 21.059413] mlx5_core 0000:02:00.1: firmware version: 14.30.1004 [ 21.059443] mlx5_core 0000:02:00.1: 63.008 Gb/s available PCIe bandwidth (8.0 GT/s PCIe x8 link) [ 21.261641] mlx5_core 0000:02:00.1: E-Switch: Total vports 10, per vport: max uc(1024) max mc(16384) [ 21.263970] mlx5_core 0000:02:00.1: Port module event: module 1, Cable plugged [ 22.935551] mlx5_core 0000:02:00.1: E-Switch: cleanup [ 23.627161] mlx5_core 0000:02:00.1: init_one:1371:(pid 306): mlx5_load_one failed with error code -22 [ 23.627463] mlx5_core: probe of 0000:02:00.1 failed with error -22 Reverting to previous kernel resolves the issue (5.10.61-1-lts): # dmesg | grep mlx [ 26.818341] mlx5_core 0000:02:00.0: firmware version: 14.30.1004 [ 26.818370] mlx5_core 0000:02:00.0: 63.008 Gb/s available PCIe bandwidth (8.0 GT/s PCIe x8 link) [ 27.019482] mlx5_core 0000:02:00.0: E-Switch: Total vports 10, per vport: max uc(1024) max mc(16384) [ 27.021747] mlx5_core 0000:02:00.0: Port module event: module 0, Cable unplugged [ 27.032310] mlx5_core 0000:02:00.1: firmware version: 14.30.1004 [ 27.032369] mlx5_core 0000:02:00.1: 63.008 Gb/s available PCIe bandwidth (8.0 GT/s PCIe x8 link) [ 27.250396] mlx5_core 0000:02:00.1: E-Switch: Total vports 10, per vport: max uc(1024) max mc(16384) [ 27.253858] mlx5_core 0000:02:00.1: Port module event: module 1, Cable plugged [ 27.265641] mlx5_core 0000:02:00.0: MLX5E: StrdRq(0) RqSz(1024) StrdSz(256) RxCqeCmprss(0) [ 27.475908] mlx5_core 0000:02:00.0: Supported tc offload range - chains: 4294967294, prios: 4294967295 [ 27.492488] mlx5_core 0000:02:00.1: MLX5E: StrdRq(0) RqSz(1024) StrdSz(256) RxCqeCmprss(0) [ 27.697332] mlx5_core 0000:02:00.1: Supported tc offload range - chains: 4294967294, prios: 4294967295 [ 27.717900] mlx5_core 0000:02:00.0 enp2s0f0np0: renamed from eth0 [ 27.806765] mlx5_core 0000:02:00.1 enp2s0f1np1: renamed from eth1 [ 52.465495] mlx5_core 0000:02:00.1 enp2s0f1np1: Link down # lspci | grep -i mel 02:00.0 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx] 02:00.1 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx] # mstconfig -d 02:00.0 q Device #1: ---------- Device type: ConnectX4LX Name: MCX4121A-ACA_Ax Description: ConnectX-4 Lx EN network interface card; 25GbE dual-port SFP28; PCIe3.0 x8; ROHS R6 Device: 02:00.0 Additional info: * package version: linux-lts 5.10.68-1 I have two systems with identical cards that have the same issue after the upgrade, different CPU types (e3-1241v3 vs E5-2650v4). |
This task depends upon
drivers/net/ethernet/mellanox/mlx5/core/devlink.c | 52 +++
drivers/net/ethernet/mellanox/mlx5/core/en/fs.h | 6
drivers/net/ethernet/mellanox/mlx5/core/en_fs.c | 10
drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 15 +
drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c | 5
drivers/net/ethernet/mellanox/mlx5/core/ipoib/ipoib.c | 18 -
Can you confirm this resolves the issue?