FS#54922 - [linux] 4.12 - bonding module not working with wireless
Attached to Project:
Arch Linux
Opened by James (thx1138) - Monday, 24 July 2017, 19:02 GMT
Last edited by Evangelos Foutras (foutrelis) - Sunday, 20 August 2017, 06:39 GMT
Opened by James (thx1138) - Monday, 24 July 2017, 19:02 GMT
Last edited by Evangelos Foutras (foutrelis) - Sunday, 20 August 2017, 06:39 GMT
|
Details
Going from linux 4.11 to 4.12, currently 4.12.3-1, the
bonding module received a patch,
[next] bonding: fix active-backup transition https://patchwork.ozlabs.org/patch/746683/ which requires correct reporting of link speed using the kernel net/core/ethtool.c __ethtool_get_link_ksettings() presumably giving an error at 581 err = dev->ethtool_ops->get_settings(dev, &cmd); Apparently, this function does not play nicely with wireless drivers, at least for the Atheros ath5k and ath9k, and for several Realtek wireless drivers I have tried. The consequences are two-fold. Note: drivers/net/bonding/bond_main.c if (bond_update_speed_duplex(slave)) { slave->link = BOND_LINK_DOWN; netdev_warn(bond->dev, "failed to get link speed/duplex for %s\n", slave->dev->name); continue; } 1) While the wireless drivers work perfectly well alone, and the wired network interfaces continue to work with the bonding module, when used in conjunction with the bonding module, a wireless interface will be put into the "down" state, and will not work with the bonding module. 2) Apparently, this "bond_update_speed_duplex(slave)" function executes 10 times per second, and a) the log file will be "spammed" with "failed to get link speed/duplex for blah" warnings continuously, 10 times per second, and b) these log messages may be sent to the console, 10 times per second, effectively creating a "Denial of Service" at the console. A remote terminal is then needed to reconfigure networking, to remove the wireless slave from the bonding module. The problem has been communicated privately to the bonding module developers: Andy Gospodarek <andy@greyhouse.net> Mahesh Bandewar <mahesh@bandewar.net> Thomas Davis <tadavis@lbl.gov> I am not certain whether to blame the kernel ethtool or the wireless drivers for the "get_settings()" error, but the bonding module can be blamed for spamming the log file. No course of action has yet been determined. For the moment, the options are: 1) revert the patch, or 2) downgrade the kernel |
This task depends upon
Closed by Evangelos Foutras (foutrelis)
Sunday, 20 August 2017, 06:39 GMT
Reason for closing: Fixed
Additional comments about closing: linux 4.12.8-2
Sunday, 20 August 2017, 06:39 GMT
Reason for closing: Fixed
Additional comments about closing: linux 4.12.8-2
Edit:
This commit is a fix for another issue so reverting this will reintroduce that issue correct?
Is the issue still present in linux 4.13-rc2? Is there any public discussion of this issue / bug report?
Andy commented, privately:
"To me it's a bit of an interesting problem both technically and
politically. Mahesh's patches were written to address issues where
link-speed could not be calculated/collected and this was (I'm
guessing) causing issues with 802.3ad mode (4) since link-speed is
used to chose the active aggregator."
Clearly, the problem has not been thought-through entirely. And, I have seen other problems where wireless connection speeds are not reported correctly or are not reported at all, with wireless utilities, and even with the bonding module properly determining the "better" network interface when "primary_reselect" is set to "better". There may be a more general problem with wireless drivers failing to report connection speeds properly.
As far as I know, no additional work has been done with the bonding module since 2017 April to address this issue.
There is not yet any public discussion or bug report, mainly because I do not know where else to report this, and the developers have not suggested any venue. So far, only Andy has responded to my emails. And, I have not had much luck using the LKML as a general forum for these kinds of issues. I did send a note to Matthew Wilcox <matthew@wil.cx>, the name listed in net/core/ethtool.c, asking about the relationship between the kernel ethtool and these wireless drivers, but I don't know if Matthew is still involved, since the original date for ethtool.c was 2003.
I am not sure arch will do a revert that fixes one thing but breaks another which upstream linux-stable queue has not taken,
that upstream's upstream linux-mainline has not taken and there is no position from upstream what course of action should be taken.
Third option would be disable bonding for the 4.12 series as 4.11 is now EOL until the issue is resolved upstream.
Andy commented:
"Unfortunately I'm not sure how many are really using bonding and
wireless with this driver, so this might not be a case that has been
tested much."
How many people have automatic wired and wireless switching on their laptops? I like it, but I had to build a custom solution to make it work. So maybe not much testing.
> Third option would be disable bonding for the 4.12 series as 4.11 is now EOL until the issue is resolved upstream.
That may be the most practical. At least, I wanted to make a note about the issue, in case anyone else is running into this.
I can update this thread when I hear back more from the developers.
Please follow at:
Bug 196547 - Since 4.12 - bonding module not working with wireless drivers
https://bugzilla.kernel.org/show_bug.cgi?id=196547
As Andy mentioned, this bug may have a political aspect, so please make your voices heard at kernel.org.