FS#59473 - [systemd] Weird networking problem with Realtek RTL8111/8168/8411 (rev 0c) ethernet
Attached to Project:
Arch Linux
Opened by Y.G. (theYinYeti) - Saturday, 28 July 2018, 10:47 GMT
Last edited by Dave Reisner (falconindy) - Monday, 07 January 2019, 11:20 GMT
Opened by Y.G. (theYinYeti) - Saturday, 28 July 2018, 10:47 GMT
Last edited by Dave Reisner (falconindy) - Monday, 07 January 2019, 11:20 GMT
|
Details
I’ve discussed this issue at length on Freenode##Networking,
to no avail…
== Description: I will describe the current situation (see further for some history). I have a mix of good and bad behaviours: GOOD:: * On this PC, pacman works flawlessly. * In Firefox, videos from Youtube or elsewhere do not display any hiccup. * In Firefox, web browsing seems to be mostly OK. * All LAN networking seems to be working correctly. * Any download through ssh -D (socks proxy) to my home server, or through TOR, works fine. BAD:: * In Firefox (without the ssh-based socks proxy), most downloads simply fail (Firefox-specific timeout?), eg: - https://dist.torproject.org/torbrowser/7.5.6/tor-browser-linux64-7.5.6_en-US.tar.xz - https://f-droid.org/FDroid.apk * The same downloads in curl or wget do not fail, but show _very long_ waits when throughput is 0/s, interleaved with very short spikes during which speed is at its maximum (a bit more than 2MB/s), eg: - https://f-droid.org/FDroid.apk is downloaded by curl in 20 minutes 50 sec. (avg. 6kB/s)! == Additional info: === package version(s) On 2018-07-10, I ran `pacman -Syu`, which I had not done since 2018-06-09. Now, I have networking problems on this PC when going to the Internet, and I am almost sure the problems started after this upgrade. The real certainties I have are: * I used to not have these problems on this PC. * On my home server, which runs Archlinux last updated on 2018-05-12, network is working perfectly. === config and/or log files etc. (and some history) On 2018-07-10, when I ran the upgrade, I was using NetworkManager.service, and the nm-applet in Gnome. ``` #/etc/NetworkManager/NetworkManager.conf [main] plugins=keyfile dhcp=dhclient dns=none #/etc/NetworkManager/system-connections/Connexion\ filaire [connection] id=Wired Connection uuid=92dc2000… type=ethernet permissions= timestamp=1531289613 [ethernet] mac-address=44:… mac-address-blacklist= [ipv4] dns=1.1.1.1; dns-search= ignore-auto-dns=true method=auto [ipv6] addr-gen-mode=eui64 dns-search= method=auto #/etc/resolv.conf search lan nameserver 1.1.1.1 nameserver 8.8.8.8 ``` (I tried other DNS too) At that time, *even pacman failed to run properly!* When nothing I tried, and nothing suggested to me, did any good, I decided to reinstall Archlinux. However, the new installation is on new LVs from my whole-disk LVM. So I can still mount and read (and maybe even boot) the old Archlinux if needed. In contrast to the old Archlinux, the new one: * boots from a (new) GPT-formatted SSD in EFI mode (instead of booting from the big MBR-formatted disk in legacy-BIOS mode); * is full-systemd (systemd-networkd + systemd-timesyncd + systemd-resolved, instead of NetworkManager + ntpd + /etc/resolv.conf). ``` #/etc/systemd/network/eth.network [Match] Name=en* [Network] DHCP=ipv4 DNS=1.1.1.1 DNS=8.8.8.8 IPForward=yes #/etc/resolv.conf -> /run/systemd/resolve/resolv.conf nameserver 1.1.1.1 nameserver 8.8.8.8 nameserver 192.168.1.1 ``` == Steps to reproduce: `curl --noproxy '*' 'https://f-droid.org/FDroid.apk' | wc -c` == Some observations (might be unrelated) Tests show that ICMP packets >64 fail on the network, eg. a `ping -s 65 -c 1` to each IP shown by `traceroute ipv4.google.com` results in: ``` === 192.168.1.1 (my ISP-provided VDSL router) --- 192.168.1.1 ping statistics --- 1 packets transmitted, 1 received, 0% packet loss, time 0ms rtt min/avg/max/mdev = 0.457/0.457/0.457/0.000 ms === * * * === 212.194.171.40 (be23.cbr01-ntr.net.bbox.fr — BBox is my ISP) --- 212.194.171.40 ping statistics --- 1 packets transmitted, 0 received, 100% packet loss, time 0ms === * * * === 72.14.213.208 --- 72.14.213.208 ping statistics --- 1 packets transmitted, 1 received, 0% packet loss, time 0ms rtt min/avg/max/mdev = 16.655/16.655/16.655/0.000 ms === * * * === 66.249.95.102 / 216.239.59.208 / 209.85.251.142 --- 66.249.95.102 ping statistics --- 1 packets transmitted, 0 received, 100% packet loss, time 0ms --- 216.239.59.208 ping statistics --- 1 packets transmitted, 0 received, 100% packet loss, time 0ms --- 209.85.251.142 ping statistics --- 1 packets transmitted, 0 received, 100% packet loss, time 0ms ### always 100% packet loss after that, until the end: === 108.170.230.208 / 216.58.198.206 (par10s27-in-f14.1e100.net aka. ipv4.google.com) --- 108.170.230.208 ping statistics --- 1 packets transmitted, 0 received, 100% packet loss, time 0ms --- 216.58.198.206 ping statistics --- 1 packets transmitted, 0 received, 100% packet loss, time 0ms ``` While this does seem abnormal, the other machines on the LAN are not disturbed (home server, and 2 more PCs). A (slightly older) run of MTR (https://ptpb.pw/EbWa) in TCP mode (not ICMP) shows a very different result: ``` Start: 2018-07-18T20:21:12+0200 HOST: sedentaire Loss% Snt Last Avg Best Wrst StDev 1.|-- _gateway 0.0% 10 0.5 0.5 0.5 0.6 0.0 2.|-- ??? 100.0 10 0.0 0.0 0.0 0.0 0.0 3.|-- be23.cbr01-ntr.net.bbox.fr 0.0% 10 16.4 16.9 15.9 17.5 0.6 4.|-- la12.rpt01-ix2.net.bbox.fr 20.0% 10 7166. 2688. 15.0 7166. 3024.8 5.|-- 72.14.213.208 0.0% 10 15.6 15.6 15.0 16.6 0.4 6.|-- 108.170.231.95 0.0% 10 16.1 16.5 15.9 18.4 0.9 7.|-- 108.170.244.241 0.0% 10 16.6 16.2 15.4 17.2 0.5 8.|-- 209.85.255.106 0.0% 10 16.4 16.9 16.4 17.3 0.3 9.|-- 108.170.236.72 0.0% 10 24.6 24.4 23.9 25.1 0.4 10.|-- 216.239.58.132 0.0% 10 24.4 24.2 23.8 24.7 0.3 11.|-- 108.170.246.161 0.0% 10 24.8 24.9 23.6 33.5 3.0 12.|-- 216.239.43.211 0.0% 10 24.0 24.1 23.6 24.3 0.2 13.|-- lhr25s08-in-f4.1e100.net 0.0% 10 24.1 24.1 23.5 24.3 0.2 ``` Same for the server hosting the Tor-browser download: https://ptpb.pw/L7ce My network addresses: ``` 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever inet6 ::1/128 scope host valid_lft forever preferred_lft forever 2: enp2s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000 link/ether 44:8a:5b:… brd ff:ff:ff:ff:ff:ff inet 192.168.1.99/24 brd 192.168.1.255 scope global dynamic enp2s0 valid_lft 77795sec preferred_lft 77795sec inet6 fe80::468a:5bff:…/64 scope link valid_lft forever preferred_lft forever ``` |
This task depends upon
Closed by Dave Reisner (falconindy)
Monday, 07 January 2019, 11:20 GMT
Reason for closing: Fixed
Additional comments about closing: change was reverted with systemd v240
Monday, 07 January 2019, 11:20 GMT
Reason for closing: Fixed
Additional comments about closing: change was reverted with systemd v240
… used to be :-( I just updated my home server, also running Archlinux, and guess what? Same problem! For me, this is confirmation that:
* This is either an Archlinux or an upstream issue, or an uncanny incompatibility between my ISP and some upstream evolution…
* This is _not_ a hardware problem.
Interestingly, my home server actually runs two hosts — the hardware host itself running Archlinux, and a systemd-nspawn Archlinux guest —, and even though I run `pacman -Syu` on both, *only the host misbehaves*; the guest still works fine! In my opinion, this tells us, that:
* This is _not_ a kernel issue (since the host and guest share the same kernel).
It happens that both the host and guest of my home server have their /etc folder under Git, using etckeeper, and etckeeper keeps track of installed packages. So I ran a diff between a `git diff` on the packages-list file of the guest, and the same `git diff` on the host. Here is the result:
```
< -busybox 1.28.3-1
< +busybox 1.28.4-2
< -ddclient 3.8.3-2
< +ddclient 3.8.3-3
< -exim 4.91-1
< +exim 4.91-2
< -haproxy 1.8.7-1
< +haproxy 1.8.9-1
< -iodine 0.7.0-3
< +iodine 0.7.0-4
< -libpgm 5.2.122-2
< +libpgm 5.2.122-3
< -lua51-expat 1.3.0-3
< +lua51-expat 1.3.0-4
< -lua51-lpeg 1.0.1-1
< +lua51-lpeg 1.0.1-2
< -lua 5.3.4-2
< +lua 5.3.5-1
< -mercurial 4.6-1
< +mercurial 4.6.2-1
< -nginx-mainline 1.13.12-1
< +nginx-mainline 1.15.2-1
< -nginx-mainline-mod-lua 0.10.11-1
< +nginx-mainline-mod-lua 0.10.13-1
< -nginx-mainline-mod-ndk 0.3.0-11
< +nginx-mainline-mod-ndk 0.3.0-13
< -perl-crypt-openssl-random 0.11-6
< +perl-crypt-openssl-random 0.15-1
< -perl-http-message 6.16-1
< +perl-http-message 6.18-1
< -perl-io-socket-ssl 2.055-1
< +perl-io-socket-ssl 2.056-1
< -perl-mail-dkim 0.52-1
< +perl-mail-dkim 0.53-1
< -perl-net-dns 1.15-1
< +perl-net-dns 1.16-1
< -php-fpm 7.2.5-2
< +php-fpm 7.2.8-1
< -php-tidy 7.2.5-2
< +php-tidy 7.2.8-1
< -prosody 1:0.10.0-2
< +prosody 1:0.10.2-1
< -zeromq 4.2.2-2
< +zeromq 4.2.5-1
> +aom 1.0.0-1
> -avahi 0.7+4+gd8d8c67-1
> +avahi 0.7+16+g1cc2b8e-1
> -btrfs-progs 4.16-2
> +btrfs-progs 4.17-1
> -docbook-xml 4.5-6
> +docbook-xml 4.5-7
> -dovecot 2.3.1-2
> +dovecot 2.3.2.1-1
> -ffmpeg 1:4.0-1
> +ffmpeg 1:4.0.2-1
> -gsm 1.0.17-1
> +gsm 1.0.18-1
> -intel-ucode 20180425-1
> +intel-ucode 20180703-1
> +iperf3 3.5-1
> -jansson 2.10-3
> +jansson 2.11-1
> -lame 3.100-1
> +lame 3.100-2
> -libcups 2.2.7-2
> +libcups 2.2.8-3
> -libdaemon 0.14-3
> +libdaemon 0.14-4
> -libdrm 2.4.91-3
> +libdrm 2.4.92-1
> -libhx 3.22-1
> +libhx 3.22-2
> -libid3tag 0.15.1b-8
> +libid3tag 0.15.1b-9
> -libiec61883 1.2.0-4
> +libiec61883 1.2.0-5
> -libmariadbclient 10.1.32-1
> +libmariadbclient 10.1.34-1
> -libogg 1.3.3-2
> +libogg 1.3.3-3
> -libomxil-bellagio 0.9.3-1
> +libomxil-bellagio 0.9.3-2
> -libpulse 11.1-1
> +libpulse 12.2-2
> -libutempter 1.1.6-2
> +libutempter 1.1.6-3
> -libva 2.1.0-1
> +libva 2.2.0-2
> -libxinerama 1.1.3-2
> +libxinerama 1.1.4-1
> +libxv 1.0.11-1
> -libxxf86vm 1.1.4-1
> +libxxf86vm 1.1.4-2
> -linux 4.16.8-1
> +linux 4.17.10-1
> -linux-firmware 20180416.b562d2f-1
> +linux-firmware 20180606.d114732-1
> -llvm-libs 6.0.0-4
> +llvm-libs 6.0.1-1
> -lm_sensors 3.4.0-2
> +lm_sensors 3.4.0-4
> -lvm2 2.02.177-5
> +lvm2 2.02.180-1
> -mesa 18.0.3-1
> +mesa 18.1.4-1
> -minidlna 1.2.1-3
> +minidlna 1.2.1-4
> -mkinitcpio-busybox 1.28.3-1
> +mkinitcpio-busybox 1.28.4-1
> -msmtp 1.6.6-1
> +msmtp 1.6.8-1
> -msmtp-mta 1.6.6-1
> +msmtp-mta 1.6.8-1
> -nextcloud 13.0.2-1
> +nextcloud 13.0.5-1
> -nfsidmap 2.3.1-1
> +nfsidmap 2.3.2-2
> -nfs-utils 2.3.1-1
> +nfs-utils 2.3.2-2
> -openjpeg2 2.3.0-1
> +openjpeg2 2.3.0-2
> -pciutils 3.5.6-1
> +pciutils 3.6.1-1
> -php-embed 7.2.5-2
> +php-embed 7.2.8-1
> -php-imagick 3.4.3-3
> +php-imagick 3.4.3-4
> -pigeonhole 0.5.1-2
> +pigeonhole 0.5.2-2
> -postgresql 10.4-2
> +postgresql 10.4-3
> -postgresql-old-upgrade 9.6.8-1
> +postgresql-old-upgrade 9.6.9-1
> -python2-psycopg2 2.7.4-1
> +python2-psycopg2 2.7.5-1
> -python 3.6.5-2
> +python 3.6.6-1
> -python-idna 2.6-1
> +python-idna 2.7-2
> -python-pyasn1 0.4.2-1
> +python-pyasn1 0.4.3-1
> -python-pyasn1-modules 0.2.1-1
> +python-pyasn1-modules 0.2.2-1
> -python-pyopenssl 17.5.0-2
> +python-pyopenssl 18.0.0-1
> -python-setuptools 1:39.1.0-1
> +python-setuptools 1:39.2.0-2
> -sdl2 2.0.8-8
> +sdl2 2.0.8-9
> -speexdsp 1.2rc3-2
> +speexdsp 1.2rc3-3
> -systemd-sysvcompat 238.76-1
> +systemd-sysvcompat 239.0-2
> -usbutils 009-1
> +usbutils 010-1
> -uwsgi 2.0.17-1
> +uwsgi 2.0.17.1-2
> -uwsgi-plugin-php 2.0.17-1
> +uwsgi-plugin-php 2.0.17.1-2
> -vid.stab 1.1-1
> +vid.stab 1.1-2
> -vim 8.0.1815-1
> +vim 8.1.0022-1
> -vim-runtime 8.0.1815-1
> +vim-runtime 8.1.0022-1
> -wayland 1.14.0-1
> +wayland 1.15.0-1
> -x265 2.7-1
> +x265 2.8-1
> -xfsprogs 4.15.1-1
> +xfsprogs 4.17.0-1
> -zita-alsa-pcmi 0.2.0-3
> +zita-alsa-pcmi 0.2.0-4
> -zstd 1.3.4-1
> +zstd 1.3.5-1
```
Lines beginning with “<” show pacman updates that happened on the guest, but not on the host.
Lines beginning with “>” show pacman updates that happened on the host, but not on the guest.
Besides, after a couple of unrelated upgrades (nginx-mainline-mod-ndk, nginx-mainline-mod-lua, and php-imagick) and some services’ restarts, the guest now has the same problem as the host, which might place the problem in some core service; maybe systemd-related?
For reference, here is the full pacman output on the host:
https://ptpb.pw/8qug
and on the guest:
https://ptpb.pw/fZjb
One thing is *sure*: somewhere in one of these 2 pastes lies the explanation for the network issue…
```
[2018-07-28 21:21] [ALPM] downgraded libseccomp (2.3.3-1 -> 2.3.2-2)
[2018-07-28 21:21] [ALPM] downgraded zlib (1:1.2.11-3 -> 1:1.2.11-2)
[2018-07-28 21:21] [ALPM] downgraded idnkit (1.0-4 -> 1.0-3)
[2018-07-28 21:21] [ALPM] downgraded libutil-linux (2.32.1-1 -> 2.32-3)
[2018-07-28 21:21] [ALPM] downgraded gcc-libs (8.1.1+20180531-1 -> 8.1.0-1)
[2018-07-28 21:21] [ALPM] downgraded gdbm (1.16-1 -> 1.14.1-1)
[2018-07-28 21:21] [ALPM] downgraded bind-tools (9.13.0-2 -> 9.12.1-1)
[2018-07-28 21:21] [ALPM] downgraded file (5.33-3 -> 5.33-1)
[2018-07-28 21:21] [ALPM] downgraded gc (7.6.6-1 -> 7.6.4-1)
[2018-07-28 21:21] [ALPM] downgraded gcc (8.1.1+20180531-1 -> 8.1.0-1)
[2018-07-28 21:21] [ALPM] downgraded libtool (2.4.6+40+g6ca5e224-7 -> 2.4.6+40+g6ca5e224-6)
[2018-07-28 21:21] [ALPM] downgraded libunistring (0.9.10-1 -> 0.9.9-1)
[2018-07-28 21:21] [ALPM] downgraded guile (2.2.4-1 -> 2.2.3-1)
[2018-07-28 21:21] [ALPM] downgraded hwids (20180518-1 -> 20171003-1)
[2018-07-28 21:21] [ALPM] downgraded intel-ucode (20180703-1 -> 20180425-1)
[2018-07-28 21:21] [ALPM] downgraded libelf (0.171-1 -> 0.170-1)
[2018-07-28 21:21] [ALPM] downgraded libnftnl (1.1.1-1 -> 1.0.9-1)
[2018-07-28 21:21] [ALPM] downgraded lz4 (1:1.8.2-2 -> 1:1.8.1.2-1)
[2018-07-28 21:21] [ALPM] downgraded libgpg-error (1.32-1 -> 1.31-1)
[2018-07-28 21:21] [ALPM] downgraded libgcrypt (1.8.3-1 -> 1.8.2-1)
[2018-07-28 21:21] [ALPM] downgraded libsystemd (239.0-2 -> 238.76-1)
[2018-07-28 21:21] [ALPM] downgraded iproute2 (4.17.0-1 -> 4.16.0-1)
[2018-07-28 21:21] [ALPM] downgraded sysfsutils (2.1.0-10 -> 2.1.0-9)
[2018-07-28 21:21] [ALPM] reinstalled iputils (20161105.1f2bb12-2)
[2018-07-28 21:21] [ALPM] downgraded jansson (2.11-1 -> 2.10-3)
[2018-07-28 21:21] [ALPM] downgraded kmod (25-1 -> 24-1)
[2018-07-28 21:21] [ALPM] downgraded libarchive (3.3.2-2 -> 3.3.2-1)
[2018-07-28 21:21] [ALPM] downgraded libdaemon (0.14-4 -> 0.14-3)
[2018-07-28 21:21] [ALPM] downgraded libdrm (2.4.92-1 -> 2.4.91-3)
[2018-07-28 21:21] [ALPM] downgraded libedit (20180525_3.1-1 -> 20170329_3.1-1)
[2018-07-28 21:21] [ALPM] downgraded libidn2 (2.0.5-1 -> 2.0.4-2)
[2018-07-28 21:21] [ALPM] downgraded libnghttp2 (1.32.0-1 -> 1.31.1-1)
[2018-07-28 21:21] [ALPM] downgraded libutempter (1.1.6-3 -> 1.1.6-2)
[2018-07-28 21:21] [ALPM] downgraded linux-firmware (20180606.d114732-1 -> 20180416.b562d2f-1)
[2018-07-28 21:21] [ALPM] downgraded mkinitcpio-busybox (1.28.4-1 -> 1.28.3-1)
[2018-07-28 21:21] [ALPM] downgraded pam (1.3.1-1 -> 1.3.0-2)
[2018-07-28 21:21] [ALPM] downgraded util-linux (2.32.1-1 -> 2.32-3)
[2018-07-28 21:21] [ALPM] downgraded systemd (239.0-2 -> 238.76-1)
[2018-07-28 21:21] [ALPM] downgraded linux (4.17.10-1 -> 4.16.8-1)
[2018-07-28 21:21] [ALPM] downgraded netctl (1.17-1 -> 1.16-1)
[2018-07-28 21:21] [ALPM] downgraded npth (1.6-1 -> 1.5-1)
[2018-07-28 21:21] [ALPM] downgraded p11-kit (0.23.12-1 -> 0.23.10-1)
[2018-07-28 21:21] [ALPM] downgraded pciutils (3.6.1-1 -> 3.5.6-1)
[2018-07-28 21:21] [ALPM] downgraded procps-ng (3.3.15-1 -> 3.3.14-1)
[2018-07-28 21:21] [ALPM] downgraded systemd-sysvcompat (239.0-2 -> 238.76-1)
[2018-07-28 21:21] [ALPM] downgraded usbutils (010-1 -> 009-1)
[2018-07-28 21:21] [ALPM] downgraded zstd (1.3.5-1 -> 1.3.4-1)
```
```
diff --git a/iproute2/ematch_map b/iproute2/ematch_map
index 1823983..4d6bb2f 100644
--- a/iproute2/ematch_map
+++ b/iproute2/ematch_map
@@ -5,3 +5,4 @@
4 meta
7 canid
8 ipset
+9 ipt
diff --git a/iproute2/rt_protos b/iproute2/rt_protos
index 82cf9c4..2a9ee01 100644
--- a/iproute2/rt_protos
+++ b/iproute2/rt_protos
@@ -16,16 +16,3 @@
15 ntk
16 dhcp
42 babel
-
-#
-# Used by me for gated
-#
-254 gated/aggr
-253 gated/bgp
-252 gated/ospf
-251 gated/ospfase
-250 gated/rip
-249 gated/static
-248 gated/conn
-247 gated/inet
-246 gated/default
```
* after iproute2: all is working fine;
* after linux-firmware: all is working fine;
* after *systemd*: problem is back.
Then I noticed that reverting systemd, libsystemd and systemd-sysvcompat was not enough to bring back a working curl; I had to reboot as well. Besides, I see that up/down-grading these 3 packages triggers mkinitcpio.
Thus my best guess is that the problem lies in the generated kernel _as a consequence_ of the newer versions of systemd, libsystemd and systemd-sysvcompat.
As a last test, I set “IgnorePkg = systemd libsystemd systemd-sysvcompat” and otherwise upgraded the whole system. All is still OK.
On my home server (which is still up to the latest packages), running `sysctl net.ipv4.tcp_ecn=0` solves the issue :-)
Now I wonder:
* should I set this permanently, or
* is this considered a bug (albeit with a workaround), and a future systemd release would fix the issue without needing a sysctl tweak?
According to https://github.com/systemd/systemd/issues/9087, the current state is a “wait-and-see” release, where the ECN change is done by systemd, to see if people complain (and I just did); the plan being to incorporate the change into the kernel if things go well (apparently not…).