FS#74397 - [kubelet] Image garbage collection broken because imageFs can't be found
Attached to Project:
Community Packages
Opened by Wolfgang Walther (wolfgangwalther) - Friday, 08 April 2022, 07:01 GMT
Last edited by David Runge (dvzrv) - Friday, 29 April 2022, 07:02 GMT
Opened by Wolfgang Walther (wolfgangwalther) - Friday, 08 April 2022, 07:01 GMT
Last edited by David Runge (dvzrv) - Friday, 29 April 2022, 07:02 GMT
|
Details
I'm running a kubernetes cluster on Arch. The journal for
kubelet.service is flooded with the following errors every
few minutes:
E0407 [timestamp] 479 kubelet.go:1347] "Image garbage collection failed multiple times in a row" err="failed to get imageFs info: non-existent label \"docker-images\"" After running the cluster for a few weeks with a lot of containers spinning up and down (running a Gitlab CI instance with Gitlab Runner) the disk was 100% full, because no images were cleared up. I observe the same on both nodes, wich are set up identically: - Using docker/containerd as run-time - btrfs as the root filesystem - btrfs as the docker storage driver I run kubelet 1.23.5-1 right now, but have started a few versions back. This issue reliably shows up when the kubelet.service is started before the docker.service as described in [1]. When the docker.service is started first, the problem disappears. Adding After=docker.service to the kubelet.service's [Unit] section fixed it for me. It seems like this was fixed in cri-o last year [2], although I would argue that this is better fixed in the kubelet.service file for all container run-times together. [1]: https://github.com/cri-o/cri-o/issues/4437 [2]: https://github.com/cri-o/cri-o/pull/4443 |
This task depends upon
Closed by David Runge (dvzrv)
Friday, 29 April 2022, 07:02 GMT
Reason for closing: Fixed
Additional comments about closing: Fixed with kubelet 1.23.6-2
Friday, 29 April 2022, 07:02 GMT
Reason for closing: Fixed
Additional comments about closing: Fixed with kubelet 1.23.6-2
We're always happy about feedback. I'll look into this as soon as possible!
I checked, and this does **not** fix it.
I did the following:
1. Removed After=docker.service from kubelet.service.
2. Rebooted and observed the error message showing up again after about ~5 min.
3. Added Before=kubelet.service to containerd.service.
4. Rebooted - and observed the error message again in intervals of 5 minutes.
5. Moved Before=kubelet.service from containerd.service to docker.service.
6: Rebooted - and after 15 minutes there was still no error message, it's working again.
Adding this to containerd.service does not fix it, but adding it to docker.service does.
I'll modify kubelet.service, although I am not happy about this gathered "special knowledge" in various downstream locations.
Hopefully this can be upstreamed to the kubernetes project.