Please read this before reporting a bug:
https://wiki.archlinux.org/title/Bug_reporting_guidelines
Do NOT report bugs when a package is just outdated, or it is in the AUR. Use the 'flag out of date' link on the package page, or the Mailing List.
REPEAT: Do NOT report bugs for outdated packages!
https://wiki.archlinux.org/title/Bug_reporting_guidelines
Do NOT report bugs when a package is just outdated, or it is in the AUR. Use the 'flag out of date' link on the package page, or the Mailing List.
REPEAT: Do NOT report bugs for outdated packages!
FS#75925 - [linux] System freezes since 5.19.8 when using docker
Attached to Project:
Arch Linux
Opened by Patrick (suiiii) - Friday, 16 September 2022, 20:34 GMT
Last edited by Toolybird (Toolybird) - Sunday, 23 October 2022, 21:03 GMT
Opened by Patrick (suiiii) - Friday, 16 September 2022, 20:34 GMT
Last edited by Toolybird (Toolybird) - Sunday, 23 October 2022, 21:03 GMT
|
DetailsDescription:
The system freezes when using docker on 5.19.8 and 5.19.9. This happens when doing docker run, docker pull and docker prune, most reliably when pruning the system. run and pull seem to work for a short time (like 1 run, maybe 2) before freezing. After downgrading to 5.19.7 the system works fine again. There was also no docker update during the kernel updates. I'd figure it is an upstream problem, but I could not find any other reports of this. So I wanted to report here first before going upstream. I attached 3 dumps from journalctl but I am also pasting some part of the dumps to google can pick it up. First I am getting a warning WARNING: CPU: 26 PID: 1150 at fs/kernfs/dir.c:504 __kernfs_remove.part.0+0x2bf/0x300 ... Call Trace: <TASK> ? cpumask_next+0x22/0x30 ? kernfs_name_hash+0x12/0x80 kernfs_remove_by_name_ns+0x64/0xb0 sysfs_slab_add+0x166/0x200 __kmem_cache_create+0x3f1/0x4e0 kmem_cache_create_usercopy+0x172/0x2e0 kmem_cache_create+0x16/0x20 bioset_init+0x202/0x280 dm_alloc_md_mempools+0xe5/0x180 [dm_mod e0e7e531acb17cea3054e278f4217ef31a69a6b7] dm_table_complete+0x3a0/0x690 [dm_mod e0e7e531acb17cea3054e278f4217ef31a69a6b7] table_load+0x171/0x2f0 [dm_mod e0e7e531acb17cea3054e278f4217ef31a69a6b7] ? dev_suspend+0x2c0/0x2c0 [dm_mod e0e7e531acb17cea3054e278f4217ef31a69a6b7] ctl_ioctl+0x206/0x460 [dm_mod e0e7e531acb17cea3054e278f4217ef31a69a6b7] dm_ctl_ioctl+0xe/0x20 [dm_mod e0e7e531acb17cea3054e278f4217ef31a69a6b7] __x64_sys_ioctl+0x94/0xd0 do_syscall_64+0x5f/0x90 ? exit_to_user_mode_prepare+0x16f/0x1d0 ? syscall_exit_to_user_mode+0x1b/0x40 ? do_syscall_64+0x6b/0x90 ? exc_page_fault+0x74/0x170 entry_SYSCALL_64_after_hwframe+0x63/0xcd Followed by a kernel BUG: kernel BUG at mm/slub.c:381! invalid opcode: 0000 [#1] PREEMPT SMP NOPTI CPU: 26 PID: 1150 Comm: dockerd Tainted: G W 5.19.9-arch1-1 #1 3da5a84b9442a05cd5bc412feaf8d6ab31862ed4 ... Call Trace: <TASK> kernfs_put.part.0+0x58/0x1a0 __kernfs_remove.part.0+0x18c/0x300 ? cpumask_next+0x22/0x30 ? kernfs_name_hash+0x12/0x80 kernfs_remove_by_name_ns+0x64/0xb0 sysfs_slab_add+0x166/0x200 __kmem_cache_create+0x3f1/0x4e0 kmem_cache_create_usercopy+0x172/0x2e0 kmem_cache_create+0x16/0x20 bioset_init+0x202/0x280 dm_alloc_md_mempools+0xe5/0x180 [dm_mod e0e7e531acb17cea3054e278f4217ef31a69a6b7] dm_table_complete+0x3a0/0x690 [dm_mod e0e7e531acb17cea3054e278f4217ef31a69a6b7] table_load+0x171/0x2f0 [dm_mod e0e7e531acb17cea3054e278f4217ef31a69a6b7] ? dev_suspend+0x2c0/0x2c0 [dm_mod e0e7e531acb17cea3054e278f4217ef31a69a6b7] ctl_ioctl+0x206/0x460 [dm_mod e0e7e531acb17cea3054e278f4217ef31a69a6b7] dm_ctl_ioctl+0xe/0x20 [dm_mod e0e7e531acb17cea3054e278f4217ef31a69a6b7] __x64_sys_ioctl+0x94/0xd0 do_syscall_64+0x5f/0x90 ? exit_to_user_mode_prepare+0x16f/0x1d0 ? syscall_exit_to_user_mode+0x1b/0x40 ? do_syscall_64+0x6b/0x90 ? exc_page_fault+0x74/0x170 entry_SYSCALL_64_after_hwframe+0x63/0xcd Additional info: docker --version Docker version 20.10.18, build b40c2f6b5d uname -r 5.19.9-arch1-1 Steps to reproduce: * be on 5.19.8-arch1-1 or 5.19.9-arch1-1 * do `docker system prune -a -f --volumes` (system needs to have images pulled, containers, volumes, etc - essentially needs to have data) otherwise do a `docker pull` (maybe multiple) * system freezes |
This task depends upon
https://drive.google.com/file/d/1yH0ImhBsv6eOXauulhaPsWVUUTWpVO4_/view?usp=sharing linux-5.19.7-1-x86_64.pkg.tar.zst
https://drive.google.com/file/d/1y6XCqxq-JgBS6vSQc733cESQViJ4XSrS/view?usp=sharing linux-headers-5.19.7-1-x86_64.pkg.tar.zst
https://drive.google.com/file/d/1JWip1texRp2iI8uFJPURYLh9u1-muwoO/view?usp=sharing linux-5.19.8-1-x86_64.pkg.tar.zst
https://drive.google.com/file/d/16ARyEKUjFCUrksJ3M5T60qxVXFbnfUe5/view?usp=sharing linux-headers-5.19.8-1-x86_64.pkg.tar.zst
https://drive.google.com/file/d/1ftqZrJtYiCSBW927VgPEIdmYYTR1JsKu/view?usp=sharing linux-5.19.7.r78.gbb4be611c2f5-1-x86_64.pkg.tar.zst
https://drive.google.com/file/d/1xedA6u5LIMih-kW8nGYFRbd6jbqIonpj/view?usp=sharing linux-headers-5.19.7.r78.gbb4be611c2f5-1-x86_64.pkg.tar.zst
[1] https://wiki.archlinux.org/title/Bisecting_bugs_with_Git
It looks like the problem is not reliably reproducible after all.
I did some testing with upstream 5.19.7 and 5.19.8 and both seemed to work fine. Afterwards I upgraded to arch 5.19.8 which was also fine (?). arch 5.19.9 also worked fine for some time until I tried another system prune.
Each time I did a bunch of pulls, runs, builds, and prunes which usually caused the problem after 1-3 operations.
I also found this bug report upstream where I linked this ticket too: https://bugzilla.kernel.org/show_bug.cgi?id=216493
There is also this discussion which seems to discuss the root cause: https://lore.kernel.org/lkml/20220913121723.691454-1-lk%40c--e.de/T/#mc068df068cfd19c43b16542e74d4b72dfc1b0569
I'd guess I'll stick with 5.19.7 on my main machine for now and try to get a vm test system up and running to reproduce the problem
I am still trying to reproduce the error in a vm, explicitly using the device mapper storage provider, even reproducing my main system with lvm on luks. But still no luck.
I've just encountered the very same bug. Don't know how I can help more.
Docker version 20.10.18, build b40c2f6b5d
uname -r: 5.19.10-arch1-1
However, it happened nearly immediately as I started using docker. I was using the default configuration, with my whole disk LUKS-encrypted, and things went south as described by Patrick pretty fast, so I'd guess that it is reproducible. Unfortunately, I don't have the time to create a VM-based test-bed to bisect the kernel.
Thanks for following up anyway,
[1] https://github.com/archlinux/linux/commit/4abc9965