FS#72496 - [lxc] 1:4.0.11-1 breaks all lxd containers

Attached to Project: Community Packages
Opened by Ao Zhong (hacc) - Thursday, 21 October 2021, 01:08 GMT
Last edited by Jonas Witschel (diabonas) - Wednesday, 10 November 2021, 18:31 GMT
Task Type Bug Report
Category Packages
Status Closed
Assigned To Morten Linderud (Foxboron)
Architecture x86_64
Severity Medium
Priority Normal
Reported Version
Due in Version Undecided
Due Date Undecided
Percent Complete 100%
Votes 2
Private No

Details

Description:
When I upgrade lxc from 1:4.0.10-2 to 1:4.0.11-1, all lxd containers cannot start. After downgrade lxc to 1:4.0.10-2, all containers work great.

Additional info:
* package version(s)
* config and/or log files etc.
* link to upstream bug report, if any

This is the output of lxc info --show-log:
Name: softether
Status: STOPPED
Type: container
Architecture: x86_64
Created: 2021/09/06 14:15 CEST
Last Used: 2021/10/21 02:48 CEST

Log:

lxc softether 20211021004842.900 ERROR start - start.c:core_scheduling:1572 - No such device - Failed to create new core scheduling domain
lxc softether 20211021004842.901 ERROR lxccontainer - lxccontainer.c:wait_on_daemonized_start:867 - Received container state "ABORTING" instead of "RUNNING"
lxc softether 20211021004842.903 ERROR start - start.c:__lxc_start:2068 - Failed to spawn container "softether"
lxc softether 20211021004842.903 WARN start - start.c:lxc_abort:1038 - No such process - Failed to send SIGKILL via pidfd 17 for process 5241
lxc 20211021004847.956 ERROR af_unix - af_unix.c:lxc_abstract_unix_recv_fds_iov:218 - Connection reset by peer - Failed to receive response
lxc 20211021004847.956 ERROR commands - commands.c:lxc_cmd_rsp_recv_fds:127 - Failed to receive file descriptors

Steps to reproduce:
Upgrade lxc to 1:4.0.11-1 and restart lxd, then start the container.
This task depends upon

Closed by  Jonas Witschel (diabonas)
Wednesday, 10 November 2021, 18:31 GMT
Reason for closing:  Fixed
Additional comments about closing:  lxd 4.20-1
Comment by Niklas (muffyrut) - Friday, 22 October 2021, 20:49 GMT
I can confirm this bug exactly as described. Here's the output from lxd.log:

t=2021-10-22T22:33:32+0200 lvl=info msg="LXD is starting" mode=normal path=/var/lib/lxd version=4.19
t=2021-10-22T22:33:32+0200 lvl=info msg="Kernel uid/gid map:"
t=2021-10-22T22:33:32+0200 lvl=info msg=" - u 0 0 4294967295"
t=2021-10-22T22:33:32+0200 lvl=info msg=" - g 0 0 4294967295"
t=2021-10-22T22:33:32+0200 lvl=info msg="Configured LXD uid/gid map:"
t=2021-10-22T22:33:32+0200 lvl=info msg=" - u 0 1000000 1000000000"
t=2021-10-22T22:33:32+0200 lvl=info msg=" - g 0 1000000 1000000000"
t=2021-10-22T22:33:32+0200 lvl=info msg="Kernel features:"
t=2021-10-22T22:33:32+0200 lvl=info msg=" - closing multiple file descriptors efficiently: yes"
t=2021-10-22T22:33:32+0200 lvl=info msg=" - netnsid-based network retrieval: yes"
t=2021-10-22T22:33:32+0200 lvl=info msg=" - pidfds: yes"
t=2021-10-22T22:33:32+0200 lvl=info msg=" - core scheduling: yes"
t=2021-10-22T22:33:32+0200 lvl=info msg=" - uevent injection: yes"
t=2021-10-22T22:33:32+0200 lvl=info msg=" - seccomp listener: yes"
t=2021-10-22T22:33:32+0200 lvl=info msg=" - seccomp listener continue syscalls: yes"
t=2021-10-22T22:33:32+0200 lvl=info msg=" - seccomp listener add file descriptors: yes"
t=2021-10-22T22:33:32+0200 lvl=info msg=" - attach to namespaces via pidfds: yes"
t=2021-10-22T22:33:32+0200 lvl=info msg=" - safe native terminal allocation : yes"
t=2021-10-22T22:33:32+0200 lvl=info msg=" - unprivileged file capabilities: yes"
t=2021-10-22T22:33:32+0200 lvl=info msg=" - cgroup layout: cgroup2"
t=2021-10-22T22:33:32+0200 lvl=warn msg=" - Couldn't find the CGroup hugetlb controller, hugepage limits will be ignored"
t=2021-10-22T22:33:32+0200 lvl=warn msg=" - Couldn't find the CGroup network priority controller, network priority will be ignored"
t=2021-10-22T22:33:32+0200 lvl=info msg=" - shiftfs support: no"
t=2021-10-22T22:33:32+0200 lvl=warn msg="Instance type not operational" driver=qemu err="QEMU command not available for architecture" type=virtual-machine
t=2021-10-22T22:33:32+0200 lvl=info msg="Initializing local database"
t=2021-10-22T22:33:32+0200 lvl=info msg="Set client certificate to server certificate" fingerprint=f44fd4d93be86e4dc6383030684f180ac1dca0f00540fc36e7285b5602313a19
t=2021-10-22T22:33:32+0200 lvl=info msg="Starting database node" address=1 id=1 role=voter
t=2021-10-22T22:33:32+0200 lvl=info msg="Starting /dev/lxd handler:"
t=2021-10-22T22:33:32+0200 lvl=info msg=" - binding devlxd socket" socket=/var/lib/lxd/devlxd/sock
t=2021-10-22T22:33:32+0200 lvl=info msg="REST API daemon:"
t=2021-10-22T22:33:32+0200 lvl=info msg=" - binding Unix socket" inherited=true socket=/var/lib/lxd/unix.socket
t=2021-10-22T22:33:32+0200 lvl=info msg="Initializing global database"
t=2021-10-22T22:33:32+0200 lvl=info msg="Connecting to global database"
t=2021-10-22T22:33:32+0200 lvl=info msg="Connected to global database"
t=2021-10-22T22:33:32+0200 lvl=info msg="Initialized global database"
t=2021-10-22T22:33:32+0200 lvl=info msg="Firewall loaded driver" driver=nftables
t=2021-10-22T22:33:32+0200 lvl=info msg="Initializing storage pools"
t=2021-10-22T22:33:32+0200 lvl=info msg="Initializing daemon storage mounts"
t=2021-10-22T22:33:32+0200 lvl=info msg="Loading daemon configuration"
t=2021-10-22T22:33:32+0200 lvl=info msg="Initializing networks"
t=2021-10-22T22:33:32+0200 lvl=info msg="Pruning leftover image files"
t=2021-10-22T22:33:32+0200 lvl=info msg="Done pruning leftover image files"
t=2021-10-22T22:33:32+0200 lvl=info msg="Started seccomp handler" path=/var/lib/lxd/seccomp.socket
t=2021-10-22T22:33:32+0200 lvl=info msg="Pruning expired images"
t=2021-10-22T22:33:32+0200 lvl=info msg="Done pruning expired images"
t=2021-10-22T22:33:32+0200 lvl=info msg="Pruning expired instance backups"
t=2021-10-22T22:33:32+0200 lvl=info msg="Done pruning expired instance backups"
t=2021-10-22T22:33:32+0200 lvl=info msg="Pruning resolved warnings"
t=2021-10-22T22:33:32+0200 lvl=info msg="Updating instance types"
t=2021-10-22T22:33:32+0200 lvl=info msg="Done pruning resolved warnings"
t=2021-10-22T22:33:32+0200 lvl=info msg="Done updating instance types"
t=2021-10-22T22:33:32+0200 lvl=info msg="Expiring log files"
t=2021-10-22T22:33:32+0200 lvl=info msg="Done expiring log files"
t=2021-10-22T22:33:32+0200 lvl=info msg="Updating images"
t=2021-10-22T22:33:32+0200 lvl=info msg="Done updating images"
t=2021-10-22T22:33:33+0200 lvl=info msg="Starting container" action=start created=2021-10-02T19:30:43+0200 ephemeral=false instance=apps instanceType=container project=default stateful=false used=2021-10-19T18:37:16+0200
t=2021-10-22T22:33:34+0200 lvl=info msg="Downloading image" alias=archlinux server=https://images.linuxcontainers.org
t=2021-10-22T22:33:38+0200 lvl=eror msg="Failed starting container" action=start created=2021-10-02T19:30:43+0200 ephemeral=false instance=apps instanceType=container project=default stateful=false used=2021-10-19T18:37:16+0200
t=2021-10-22T22:33:38+0200 lvl=warn msg="Failed auto start instance attempt" attempt=1 err="Failed to run: /usr/bin/lxd forkstart apps /var/lib/lxd/containers /var/log/lxd/apps/lxc.conf: " instance=apps maxAttempts=3 project=default
Comment by Morten Linderud (Foxboron) - Friday, 22 October 2021, 21:12 GMT
This has been reported in #lxc previously and I have been unable to reproduce it. Does it work if you start a new container?
Comment by Niklas (muffyrut) - Friday, 22 October 2021, 22:30 GMT
No, same problem. I didn't check the logs for the new container but it was stuck when trying to start like the others so I'm assuming it's exactly the same.
Comment by Niklas (muffyrut) - Saturday, 23 October 2021, 13:26 GMT
This is now tracked upstream and seems to be a problem with LXD, not lxc. https://github.com/lxc/lxd/issues/9419
Comment by Niklas (muffyrut) - Saturday, 23 October 2021, 18:06 GMT
Edit: Ignore this comment.
Comment by G3ro (G3ro) - Thursday, 28 October 2021, 18:06 GMT
I think that Foxboron is already aware of this, but for everyone else:
There are additional patches necessary for LXD, listed in this PR: https://github.com/lxc/lxd/pull/9352

I guess that the maintainers will either wait for a new upstream release or include the patches manually.

All in one patch would probably be: https://patch-diff.githubusercontent.com/raw/lxc/lxd/pull/9352.patch
Comment by Niklas (muffyrut) - Sunday, 07 November 2021, 16:40 GMT
I think this can be closed now since LXD 4.20 has been released. At least for me the issue is fixed.

Loading...