FS#55149 - [linux] 4.12.7 hangs when instance of systemd-nspawn is run

Attached to Project: Arch Linux
Opened by Vladimir (_v_l) - Tuesday, 15 August 2017, 06:28 GMT
Last edited by Andreas Radke (AndyRTR) - Tuesday, 01 March 2022, 21:09 GMT
Task Type Bug Report
Category Kernel
Status Closed
Assigned To Tobias Powalowski (tpowa)
Architecture All
Severity Medium
Priority Normal
Reported Version
Due in Version Undecided
Due Date Undecided
Percent Complete 100%
Votes 0
Private No

Details

Description: my system: Archlinux x86_64, Intel i5 2410M hangs when I start systemd-nspawn instance.

Additional info:
* package version(s):

kernel 4.12.7
systemd 234.11-8

* config and/or log files etc.

$ cat /etc/systemd/nspawn/node1-smoon4.nspawn
[Files]
Bind=/mnt/storage/yandex-disk:/mnt/yandex-disk

[Network]
Zone=smoon4

$ cat /etc/systemd/system/systemd-nspawn@node1-smoon4.service.d/memory.conf
[Service]
MemoryAccounting=yes
MemoryHigh=1.5G
MemoryMax=2G


Steps to reproduce:

It is 100% reproducible but there is not one good receipt, I can only describe steps that led to hang:
1. I upgraded packages today, systemd 234.11-8 and kernel 4.12.7, rebooted, login into awesome (SDDM -> awesome), started terminal (urxvt-unicode) and started systemd-nspawn instance, after a second system became irresponsible.
2. After 3 minutes laptop rebooted and I repeated the above steps three times, with the same result.
3. After third time I decided to start another systemd-nspawn instance:

$ cat /etc/systemd/nspawn/node2-smoon4.nspawn
[Exec]
Environment=DISPLAY=:0
Environment=XAUTHORITY=~vladimir/.Xauthority

[Files]
Bind=/tmp/.X11-unix
Bind=/mnt/storage/makepkg:/mnt/makepkg

[Network]
Zone=smoon4

$ cat /etc/systemd/system/systemd-nspawn@node2-smoon4.service.d/memory.conf
[Service]
MemoryAccounting=yes
MemoryHigh=1.5G
MemoryMax=2G
MemorySwapMax=1G
# MemoryHigh=500M
# MemoryMax=900M
# MemorySwapMax=10M

At first all went well then I started the first systemd-nspawn instance and few first moments all was fine, so I decided to start Firefox and Chromium and then the system hung again.

I checked SSD disk (smart), tested memory and finally tried to downgrade some packages. I tested with systemd 234.11-6 and kernel 4.12.6 and seems that only the kernel is matter. Right now I use combination systemd 234.11-8 and kernel 4.12.6 and it is stable.

The strange thing is that the same kernel-systemd versions (4.12.7, 234.11-8) seem to work fine on other my hosts: Intel i5 6200 (SSD disk), Intel i5 4570 and Intel i5 7400. All host have almost similar configuration (SSD disk, systemd-nspawn configuration, XFS), OS and packages.

---
Vladimir Lomov

P.S. Should I report upstream? If so should I report to ML or bug tracker?

This task depends upon

Closed by  Andreas Radke (AndyRTR)
Tuesday, 01 March 2022, 21:09 GMT
Reason for closing:  Fixed
Additional comments about closing:  Fixed upstream.
Comment by Vladimir (_v_l) - Wednesday, 16 August 2017, 02:53 GMT
I think I found the cause and my previous assumption (about kernel ver. affected by the bug) was wrong. But first some additional information:

- the systemd-nspawn instance node1-smoonX is used to run Yandex.Disk daemon (to synchronize files and directories);
- before kernel ver. 4.12 I was used some time (more than half of year) linux-ck kernel with bfq scheduler;
- as bfq was integrated in kernel ver. 4.12 and CK didn't provide his patches for that ver., so I decided to try mainline kernel (linux from repo);
- so I run kernel ver. 4.12.5 on all my hosts and it worked fine (with node1-smoonX instance);
- I have several hosts with SSD and HDD disks, the system installed on SSD disk, all disks (until recently) were used bfq as scheduler (it was set using udev rule);
- the host that first hang without any reason except starting node1-smoonX systemd-nspawn instance was smoon4 (therefore the instance is named as node1-smoon4), see my initial report;
- the other hosts was fine until today. Today I tried install linux-ck ver. 4.12.7-ck2 (CK released his patches for kernel ver. 4.12) on host smoon2 and it is hang after starting instance node1-smoon2. I removed the linux-ck and tried again with mainline kernel and it hang too, but this time I got information from journald (journalctl -k -f), see attached file. According information from the kernel some problem with bfq occurs and that hang the kernel.
- I changed scheduler to 'kyber' for HDD and 'mq-deadline' for SSD and now all work fine.

I'll try to report upstream about the bfq behavior.
Comment by Vladimir (_v_l) - Wednesday, 16 August 2017, 02:54 GMT
Some part of dmesg (copy and paste from terminal) on smoon2 host.
Comment by Vladimir (_v_l) - Tuesday, 22 August 2017, 13:21 GMT
This is upstream bug in BFQ.

I opened bug ticket on bugzilla.kernel.org: https://bugzilla.kernel.org/show_bug.cgi?id=196675. After I searched google group of bfq-iosched I found that this is known issue: https://groups.google.com/forum/#!topic/bfq-iosched/2odL08qoPS0, https://groups.google.com/forum/#!topic/bfq-iosched/7I3DnJ2BuQ8, https://groups.google.com/forum/#!topic/bfq-iosched/H_92hgaqgIQ.

I managed to "resolved" it by applying patches I found on linux-block mailing list (see bug report and last thread on bfq-iosched). These patches will be in kernel 4.14, so I think this task can be closed after the kernel 4.14 will released.
Comment by loqs (loqs) - Tuesday, 21 November 2017, 19:40 GMT
Does the issue still occur with linux 4.14.1-1 now in testing?
Comment by Vladimir (_v_l) - Wednesday, 22 November 2017, 00:35 GMT
I'm not sure because due to this problem I had to build kernel with patches for BFQ found on linux-block. I hope I'll stop to build kernel when 4.15 will be released (I'm watching linux-block ML and some new patches for BFQ were published, they will be definitely in 4.15).
Comment by mattia (nTia89) - Monday, 28 February 2022, 16:52 GMT
I cannot reproduce the issue. Is it still valid for you?
Comment by Vladimir (_v_l) - Monday, 28 February 2022, 23:41 GMT
Now I'm using 5.15.25 kernel and don't see any issue with systemd-nspawn and the kernel.

Loading...