FS#69475 - [Linux 5.10.11.arch1-1] Flow regression

Attached to Project: Arch Linux
Opened by Torus (T0t0) - Saturday, 30 January 2021, 20:20 GMT
Last edited by Andreas Radke (AndyRTR) - Wednesday, 21 April 2021, 10:11 GMT
Task Type Bug Report
Category Packages: Core
Status Closed
Assigned To No-one
Architecture All
Severity High
Priority Normal
Reported Version
Due in Version Undecided
Due Date Undecided
Percent Complete 100%
Votes 1
Private No

Details

Description:
Flow regression with the fiber (divided by about 50%). I made a test with the lts core and I found a normal flow rate. I've noticed that since version 5.10

Additional info:
* package version(s) > 5.10.11.arch1-1
* lspci > Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 12)

Steps to reproduce:
Using the last kernel.
This task depends upon

Closed by  Andreas Radke (AndyRTR)
Wednesday, 21 April 2021, 10:11 GMT
Reason for closing:  None
Comment by Torus (T0t0) - Saturday, 30 January 2021, 20:35 GMT
Sorry no 50% but divided by 50 :/

Edit: I had done the flow test on a Sunday during rush hour. After several tests, it is a flow rate divided by 10.
Comment by loqs (loqs) - Saturday, 30 January 2021, 22:03 GMT
Is the issue still present in 5.11-rc5? Was the issue introduced in 5.10.1.arch1-1? If so can you bisect between 5.9 and 5.10 to find the causal commit?
Comment by Torus (T0t0) - Saturday, 30 January 2021, 23:17 GMT
The linux-mainline compilation didn't work, I'll try again tomorrow if necessary. What I found is that the 5.9.14 kernel was OK but as soon as the 5.10.1 kernel, it bugged (also tested with 5.10.2 and 5.10.4-arch2-1).
Comment by loqs (loqs) - Saturday, 30 January 2021, 23:57 GMT
You can obtain linux-mainline prebuilt from [1].

[1] https://wiki.archlinux.org/index.php/Unofficial_user_repositories#miffe
Comment by Torus (T0t0) - Sunday, 31 January 2021, 00:12 GMT
Ok for the link, it's much quicker :D
Bad new linux-mainline is also bugged.
Comment by loqs (loqs) - Tuesday, 02 February 2021, 00:26 GMT
See attached file. It was designed for forum posting so please excuse the formatting.
Comment by Torus (T0t0) - Tuesday, 02 February 2021, 00:56 GMT
You're leaning more towards Arch's kernel rather than the official one?
If so [1] should help to solve this problem.

[1] https://git.archlinux.org/svntogit/packages.git/commit/?h=packages/linux&id=282c90d1e14ef7dd0f56f8c8192ba1c830e906ad
Comment by loqs (loqs) - Tuesday, 02 February 2021, 02:17 GMT
No i suspect the issue is a change on the official kernel. The instructions I provided use the config from 5.9.14 to build 5.9 from upstream without any Arch patches.
If that does not have the issue the instructions then build 5.10 again without any patches. If that does have the issue you would then bisect to locate what upstream commit causes the issue.
If you build 5.10 and it does not have the issue then it may be a packaging issue.
Comment by Torus (T0t0) - Tuesday, 02 February 2021, 16:13 GMT
Does the AUR linux-git package do the job?
I have no idea how to compile a kernel without PKGBUILD.

It's amazing that there are no other users with this problem.
Comment by loqs (loqs) - Tuesday, 02 February 2021, 17:15 GMT
The PKGBUILD and config used for 5.9.14.arch1-1 are obtained in the first git checkout. You then replace that with the PKGBUILD provided so uname will match pkgver and the documentation is not built.
You could use linux-git, you would need to change the config to that for 5.9 or 5.10 and it uses scripts/setlocalversion --save-scmversion so the uname output will not be updated during the bisection.
Comment by Torus (T0t0) - Tuesday, 02 February 2021, 18:51 GMT
Like linux-mainline, linux-git doesn't want to compile (I'm thinking about a lack of RAM memory). So, I couldn't test it :(
Comment by loqs (loqs) - Tuesday, 02 February 2021, 21:41 GMT
Add --log to the makepkg invocation and post the log please. Link is to linux-git 5.9. Does that have the issue?

(link removed)
Comment by Torus (T0t0) - Tuesday, 02 February 2021, 21:53 GMT
The output of the terminal, the errors are in French.
The bug only appears from version 5.10 onwards
Comment by loqs (loqs) - Tuesday, 02 February 2021, 21:59 GMT
git bisection relies on having a known good and bad built using the same toolchain in the same manner so the only difference is the source code.
That is why you have to test the built 5.9 and confirm it is good and test the built 5.10 and confirm it is bad. Otherwise you can get a false result due to a broken build system.

You were building on /tmp and ran out of disk space?
Edit:
Updated PKGBUILD to produce smaller packages. You should still copy linux-git off /tmp to avoid the build failure.
   PKGBUILD (5.8 KiB)
Comment by Torus (T0t0) - Tuesday, 02 February 2021, 23:28 GMT
I confirm once again that 5.9 (git version of which you provided me the link) is working.
Comment by loqs (loqs) - Tuesday, 02 February 2021, 23:51 GMT
Link for 5.10 (link removed)

Please email me using the address for the file owner to keep this thread from growing very long while bisecting the issue.
Comment by Torus (T0t0) - Friday, 05 February 2021, 02:53 GMT
Thank you very much for your help and patience (about fifteen kernel compilations) @loqs. You have allowed me to know the `git bisect` command. What comes back, a problem with the commit:

[](https://lore.kernel.org/r/20200709132344.760-5-john.ogness@linutronix.de).

Comment by Torus (T0t0) - Friday, 05 February 2021, 20:32 GMT
I see that this ticket is unassigned. Should I contact the committer directly and explain the problem?
Comment by loqs (loqs) - Friday, 05 February 2021, 21:31 GMT
I am trying to see the connection between a change in printk and flow rate.
Is there a lot of output in dmesg that could make printk the limit on network flowrate?
Comment by Torus (T0t0) - Friday, 05 February 2021, 22:04 GMT
dmesg -Hk is filled with lines concernat ufw. I put an excerpt of the command, 'kauditd_printk_skb: 1642 callbacks suppressed' is present about 40 times. systemd-journald[210]: /dev/kmsg buffer overrun, some messages lost' is present when running a flow tester.

Edit: With linux-lts, I have only one entry with dmesg -Hk: 'kauditd_printk_skb: 5 callbacks suppressed'. With the test, I have no more entries (except the ufw lines).
Comment by loqs (loqs) - Saturday, 06 February 2021, 00:05 GMT
The 3.1 Kib print_skb.txt contains audit records for firejail's apparmor use, not seeing any output related to ufw which I would guess would be iptables?

I do not know what your flow tester involves so this may not be viable but if you disable ufw and or firejail does that fix the flow rate issue?
Comment by Torus (T0t0) - Saturday, 06 February 2021, 00:24 GMT
Good point. I can't explain it, but disabling ufw on the 5.10.13-arch1-1 faulty kernel solves the problem. Firejail is OK.

Do you have an explanation?
Comment by loqs (loqs) - Saturday, 06 February 2021, 00:54 GMT
The amount of logging produced by ufw is so high it is the limiting factor in your system's ability to manage flows, the bisect found the change in how printk is handled by the kernel made that issue more apparent.
I do not know what ufw is logging but try and reduce it. There is also ulogd [1] which I believe would avoid logging to dmesg.

[1] https://www.netfilter.org/projects/ulogd/
Comment by Torus (T0t0) - Monday, 08 February 2021, 01:39 GMT
I removed ufw and installed ulogd. After a little adaptation time with iptables everything works.

Otherwise, do you think you should keep this ticket open?

Loading...