FS#67773 - [containerd] version 1.4.0-2 break docker

Attached to Project: Community Packages
Opened by Sébastien Luttringer (seblu) - Sunday, 30 August 2020, 18:45 GMT
Last edited by Morten Linderud (Foxboron) - Friday, 11 September 2020, 22:44 GMT
Task Type Bug Report
Category Upstream Bugs
Status Closed
Assigned To Sébastien Luttringer (seblu)
Morten Linderud (Foxboron)
Architecture All
Severity High
Priority Normal
Reported Version
Due in Version Undecided
Due Date Undecided
Percent Complete 100%
Votes 1
Private No

Details

Description:
New release of containerd breaks docker containers management.

1) heath check is reported unhealthy while image are running fine.
2) docker stop doesn't not really stop the container and make it unable to restart.
3) systemctl stop/kill left processes running on the system.


BBS thead: https://bbs.archlinux.org/viewtopic.php?pid=1922020#p1922020
Upstream report: https://github.com/containerd/containerd/issues/4509
Closed by  Morten Linderud (Foxboron)
Friday, 11 September 2020, 22:44 GMT
Reason for closing:  Fixed
Additional comments about closing:  1.4.0-3
Comment by Christian Felsing (83fd4c18bea05588) - Tuesday, 01 September 2020, 11:41 GMT
Reproduction:

* Install containerd 1.4.0-2, docker 1:19.03.12-2, docker-compose 1.26.2-1
* Set up some docker containers with e.g. docker-compose

In my case a Nginx (Alpine Linux), MySQL (Debian-10-Slim), Postgres and a PHP application were deployed with docker-compose.

* Wait about 1h, now all containers change from healthy to unhealthy and other bad things mentioned in bug above appearing.
Comment by Christian Felsing (83fd4c18bea05588) - Wednesday, 02 September 2020, 02:52 GMT
Workaround:

* Download latest containerd from 1.3 tree: https://containerd.io/downloads/
* Unpack archive in /usr/local - this should end up with at least /usr/local/bin/containerd
* # /usr/local/bin/containerd --version should display "containerd github.com/containerd/containerd v1.3.5 9b6f3ec0307a825c38617b93ad55162b5bb94234"
* cp -a /usr/lib/systemd/system/containerd.service /usr/lib/systemd/system/containerd.service.BAK
* Patch ExecStart in /usr/lib/systemd/system/containerd.service to ExecStart=/usr/local/bin/containerd
* reboot

Now you got a working Docker platform until a working version 1.4 of containerd is available.
Comment by Sébastien Luttringer (seblu) - Wednesday, 02 September 2020, 08:50 GMT
This task was open to move mail discussions between developers about this issue here. The first report would have been good but it was already closed and it went under the radar.
Comment by Morten Linderud (Foxboron) - Friday, 04 September 2020, 20:35 GMT
Can someone test this package?

https://pkgbuild.com/~foxboron/repos/containerd/

`pacman -U https://pkgbuild.com/~foxboron/repos/containerd/containerd-1.4.0-3-x86_64.pkg.tar.zst`

It's compiled without `-buildmode=pie` and I wonder if the health checks are race collision related. This isn't checked during the tests and `-buildmode=pie` and `-race` are strictly forbidden.
Comment by Christian Felsing (83fd4c18bea05588) - Saturday, 05 September 2020, 14:26 GMT
Looks better, after 39h still "healthy".
Comment by DeLord (DeLord) - Sunday, 06 September 2020, 22:21 GMT
I can confirm the bug, am testing the updated package now and will report back tomorrow
Comment by Wilensky (wilensky) - Tuesday, 08 September 2020, 11:28 GMT
I can confirm this bug as well on 5.7.17-2-MANJARO kernel + docker 19.03.12 + containerd v1.4.0m + runc 1.0.0.-rc92.

Also there is a discussion started on this issue recently in moby repository.

https://github.com/moby/moby/issues/41410

It is presumed by thaJeztah that go1.14.5 shoudn't be used to build docker, runc, or containerd as it is problematic.
Comment by Morten Linderud (Foxboron) - Tuesday, 08 September 2020, 11:38 GMT
Manjaro is not supported and none of the supplied information details anything useful.

The moby issue is containerd related and linking this around does nothing but create noise for upstream.


EDIT: Please be useful and read the bug report in full.
Comment by Wilensky (wilensky) - Tuesday, 08 September 2020, 12:35 GMT
I read the bug report several times and this is right what I experience. My bad, I missed the "Upstream report" link in the description.
Thanks for pointing out and sorry for being useless.
Comment by DeLord (DeLord) - Tuesday, 08 September 2020, 18:28 GMT
The updated package doesn't fix this, after some hours the healthcheck is still unresponsive and container cannot be stopped. @Foxboron I saw your notes in the upstream bugreport that it doesn't fix it, do you have any further ideas?
Comment by Morten Linderud (Foxboron) - Tuesday, 08 September 2020, 18:33 GMT
So the guy from the bugreport failed to mention they run HEAD. I can't solve it removing -buildmode=pie. However, I tried compiling from HEAD, and it seems to work just fine so far!

https://pkgbuild.com/~foxboron/repos/containerd/containerd-1.4.0-3.5-x86_64.pkg.tar.zst

Try this and please check. I'll bisect the thing when I have time.
Comment by Kirill (Zack) - Wednesday, 09 September 2020, 08:32 GMT
After installing containerd-1.4.0-3.5-x86_64.pkg.tar.zst I immediately notice 100% CPU consumption by pihole (latest). Memory consumption after a few minutes is 10 times more than usual and continues to grow. In particular, lighttpd process in the container is going crazy. It does not do any kind of blacklist updates or anything, just normal operations. This is not the case with containerd-1.3.4-2-x86_64.pkg.tar.zst.
Can't really run it for a long time in this state, have to revert to older containerd.

After 5 minutes with the latest containerd:
CONTAINER ID NAME CPU % MEM USAGE / LIMIT MEM % NET I/O BLOCK I/O PIDS
5945158458af pihole 100.30% 254.5MiB / 31.22GiB 0.80% 1.96MB / 49.6kB 0B / 22.2MB 20

With containerd 1.3.4-2:
CONTAINER ID NAME CPU % MEM USAGE / LIMIT MEM % NET I/O BLOCK I/O PIDS
5945158458af pihole 0.05% 18.24MiB / 31.22GiB 0.06% 1.94MB / 56.2kB 0B / 22.9MB 20



Other containers memory consumption and CPU usage seems to be OK.
One of my custom containers refused to start with 1.4, can't debug that now, unfortunately.
Comment by Morten Linderud (Foxboron) - Thursday, 10 September 2020, 19:42 GMT Comment by Kirill (Zack) - Thursday, 10 September 2020, 21:25 GMT
No immediate negative effects with this build.
Will check in the morning.
Comment by Kirill (Zack) - Friday, 11 September 2020, 08:16 GMT
11 hours, everything is stable.
Resource consumption is normal, health checks are working.

Loading...