FS#56957 - [systemd] systemd-networkd crash after updating to linux 4.14.11

Attached to Project: Arch Linux
Opened by Jonathan Liu (net147) - Thursday, 04 January 2018, 23:52 GMT
Last edited by Christian Hesse (eworm) - Wednesday, 10 January 2018, 20:30 GMT
Task Type Bug Report
Category Upstream Bugs
Status Closed
Assigned To Dave Reisner (falconindy)
Christian Hesse (eworm)
Architecture All
Severity Critical
Priority Normal
Reported Version
Due in Version Undecided
Due Date Undecided
Percent Complete 100%
Votes 3
Private No

Details

Description:
Ethernet (configured as bridge using systemd-networkd) stops working after updating linux from 4.14.10-1 to 4.14.11-1.

Additional info:
* linux 4.14.11-1
* systemd 236.0-2
* Ethernet controller: Intel Corporation Ethernet Connection (2) I218-V (e1000e kernel module)

Steps to reproduce:
* Configure systemd-networkd with ethernet (eth0) added to a bridge (lan0). Bridge has static IP.
* Update from 4.14.10-1 to linux 4.14.11-1 and restart system
* Notice the network (configured using systemd-networkd) doesn't come up
* Check systemd journal and you see the following:
[ 7.418986] server systemd-networkd[391]: eth0: Gained carrier
[ 7.419405] server systemd-networkd[391]: eth0: Configured
[ 7.419485] server systemd-networkd[391]: lan0: Gained carrier
[ 7.418779] server kernel: e1000e: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx
[ 7.418829] server kernel: lan0: port 1(eth0) entered blocking state
[ 7.418832] server kernel: lan0: port 1(eth0) entered forwarding state
[ 7.420320] server systemd-networkd[391]: lan0: Configured
[ 7.420513] server systemd-networkd[391]: Assertion 'IN_SET(link->state, LINK_STATE_SETTING_ADDRESSES, LINK_STATE_SETTING_ROUTES, LINK_STATE_FAILED, LINK_STATE_LINGER)' failed at ../systemd-stable/src/network/networkd-link.c:824, function route_handler(). Aborting.
[ 7.430147] server systemd[1]: Created slice system-systemd\x2dcoredump.slice.
[ 7.444568] server systemd[1]: Started Process Core Dump (PID 757/UID 0).
[ 7.646901] server systemd[1]: systemd-networkd.service: Main process exited, code=dumped, status=6/ABRT
[ 7.647039] server systemd[1]: systemd-networkd.service: Failed with result 'core-dump'.
[ 7.647452] server systemd[1]: systemd-networkd.service: Service has no hold-off time, scheduling restart.
[ 7.647552] server systemd[1]: systemd-networkd.service: Scheduled restart job, restart counter is at 1.
[ 7.647633] server systemd[1]: Stopped Network Service.
[ 7.660918] server systemd[1]: Starting Network Service...
[ 7.663672] server systemd-coredump[758]: Process 391 (systemd-network) of user 193 dumped core.

Stack trace of thread 391:
#0 0x00007f0fa8884860 raise (libc.so.6)
#1 0x00007f0fa8885ec9 abort (libc.so.6)
#2 0x00007f0fa8490768 log_assert_failed_realm (libsystemd-shared-236.so)
#3 0x000055d7a9ce4949 n/a (systemd-networkd)
#4 0x00007f0fa84ca505 sd_netlink_process (libsystemd-shared-236.so)
#5 0x00007f0fa84ca904 n/a (libsystemd-shared-236.so)
#6 0x00007f0fa8538c70 n/a (libsystemd-shared-236.so)
#7 0x00007f0fa8538eab sd_event_dispatch (libsystemd-shared-236.so)
#8 0x00007f0fa853902e sd_event_run (libsystemd-shared-236.so)
#9 0x00007f0fa853921c sd_event_loop (libsystemd-shared-236.so)
#10 0x000055d7a9caf7bd n/a (systemd-networkd)
#11 0x00007f0fa8870f4a __libc_start_main (libc.so.6)
#12 0x000055d7a9caf90a n/a (systemd-networkd)
This task depends upon

Closed by  Christian Hesse (eworm)
Wednesday, 10 January 2018, 20:30 GMT
Reason for closing:  Fixed
Additional comments about closing:  systemd 236.0-3
Comment by loqs (loqs) - Friday, 05 January 2018, 18:13 GMT
If the system is Intel based can you please add the kernel boot option pti=off and see if the issue still occurs.
If it does or the system is AMD based please bisect between 4.14.11 and 4.14.10 to find the bad commit or wait for 4.14.12 to see if that has fixed the issue.
After having located the bad commit please report the issue upstream.
Comment by Leonid Isaev (lisaev) - Friday, 05 January 2018, 19:03 GMT
@loqs, pls tell us the content of your post? pti=off is surely inspired by Meltdown? Otherwise, which upstream? (systemd or linux). Why do you even expect the issue to be in the kernel? systemd-networkd is surely at fault here, just because of dumping core and not dying gracefully. Chiming in on every bugreport isn't going to earn you any points...

@net147: can you at least boot with systemd.log_level=debug or something on the kernel cmdline? Google for the correct syntax because I'm too lazy to test. Also, it would help to see your networkd .netdev file (or whatever it is called these days). Finally, try netctl or manual creation. I can say that bridges work fine here (with netctl).
Comment by loqs (loqs) - Friday, 05 January 2018, 19:43 GMT
@lisaev the issue is filed against the kernel package and I am working on the assumption that net147 is able to produce the issue just by updating the kernel package from 4.14.10 to 4.14.11
and the issue would no longer occur on downgrading from 4.14.11 to 4.14.10 no my suggestion is not inspired by meltdown but by the proportion of commits in 4.14.11 that are for PTI [1] and
testing it by adding that option seemed a trivial way to me to rule out almost half of the patch set. Why would you think it is a systemd bug when the component that changed was the kernel?
[1] https://cdn.kernel.org/pub/linux/kernel/v4.x/ChangeLog-4.14.11 see all the x86/ldt and x86/mm entries.
Comment by Eli Schwartz (eschwartz) - Friday, 05 January 2018, 19:55 GMT
+1

@lisaev,

I like it when loqs "chimes in on every bugreport". This is not about points, he provides useful and helpful advice on frequent occasion. Whereas from your comment, you don't seem to be too sure even about your own advice, let alone your reasons for discounting loqs' advice.
Comment by Jonathan Liu (net147) - Friday, 05 January 2018, 23:30 GMT
@loqs Makes no difference with pti=off. I am booting on Intel system.

I found the systemd issue https://github.com/systemd/systemd/issues/7797 which is the same issue. Applying the patch from the pull request https://github.com/systemd/systemd/pull/7815/commits/f2c8bd2876d0171a5c1238fdfa48b415cf7cca60.patch seems to fix it.
Comment by loqs (loqs) - Tuesday, 09 January 2018, 22:33 GMT

Loading...