FS#71021 - [glibc] [containerd] realpath and paths with trailing slashes will EPERM on older kernels

Attached to Project: Arch Linux
Opened by Makoto Mizukami (makotom) - Wednesday, 26 May 2021, 03:24 GMT
Last edited by freswa (frederik) - Tuesday, 22 February 2022, 19:37 GMT
Task Type Bug Report
Category Packages: Core
Status Closed
Assigned To Giancarlo Razzolini (grazzolini)
freswa (frederik)
Architecture All
Severity Medium
Priority Normal
Reported Version 6.0.0
Due in Version Undecided
Due Date Undecided
Percent Complete 100%
Votes 0
Private No

Details

Summary and Info:

As I understand [1], pacman has a default value of `DBPATH` as `/var/lib/pacman/` (_**with** a trailing slash_).
However, due to several changes introduced to glibc 2.33, this trailing slash causes EPERM when `realpath(3)` for this path is called on older Linux kernels. (The function is indeed called by ALPM [2] [3].)

Steps to Reproduce:

1. Set up a Docker host based on a bit older Linux kernel.
The author followed this repro procedure using Ubuntu 18.04 LTS, which runs Linux kernel 4.15.0.
Note that this version of Ubuntu with this version of kernel is still supported by Canonical [4].

2. Set up a Docker container with the official `archlinux` Docker image and start a shell inside the container.

3. Populate the targeted version of `pacman` accordingly.
The author tested this procedure with pacman v5.2.2 and v6.0.0.
At this point, you may realize that you need to use `-b` option to avert the error discussed in this ticket. (See the analysis below.)

4. Run `pacman -S`

Expected behaviour:

The pacman command succeeds.

Actual behaviour:

The pacman command fails with an error message `could not find or read directory`.

Analysis:

The error can be suppressed by passing `-b /var/lib/pacman` command line argument (_**without** a trailing slash_).

This finding suggests that the issue can be fixed by removing the trailing slash (`/`) in the default value.
The author has not considered any side effects of removing the trailing slash yet from the default value.

See [5] also.

[1] https://git.archlinux.org/pacman.git/tree/meson.build#n71
[2] https://git.archlinux.org/pacman.git/tree/lib/libalpm/alpm.c#n50
[3] https://git.archlinux.org/pacman.git/tree/lib/libalpm/handle.c#n418
[4] https://ubuntu.com/kernel/lifecycle#installation-18-04
[5] https://github.com/circle-makotom/demo-20210526-realpath
This task depends upon

Closed by  freswa (frederik)
Tuesday, 22 February 2022, 19:37 GMT
Reason for closing:  Won't fix
Comment by Allan McRae (Allan) - Wednesday, 26 May 2021, 03:34 GMT
Was this an intended change in glibc-2.33? Sounds like something they would want to fix.
Comment by Eli Schwartz (eschwartz) - Wednesday, 26 May 2021, 03:36 GMT
Given this only occurs when you upgrade to glibc 2.33 and you're *also* using docker, have you checked whether this is in fact actually a docker bug like the well known  FS#69563 ?

In fact, if glibc realpath() is actually EPERMing at you when it didn't used to, doesn't this seem obviously like a glibc bug or something else in the depths of your execution environment, which badly needs to be fixed rather than being papered over by changing applications to not use slashes?

I don't like the idea of removing this slash as it affects the output of "pacman-conf DBPath" and this already regressed in the meson port which actively broke third-party scripts. Like pacman-contrib.
Comment by Makoto Mizukami (makotom) - Wednesday, 26 May 2021, 15:04 GMT
Ahh, thanks for your good catch, I just realized that the issue won't happen if I directly execute (without Docker) a binary statically linked to glibc 2.33. That means it's Docker-dependent (and in fact I confirmed it's containerd-dependent).

Now just curious whether Arch could take active actions against this, as actually I heard reports that the issue bothers several users; you can see some of them even on Google. (Though now it sounds like to me that the side effects of removing the trailing slash from the default value is considerably significant.)
Comment by Eli Schwartz (eschwartz) - Wednesday, 26 May 2021, 15:38 GMT
I don't believe there's anything we could do other than following debian stable and not upgrading glibc until after some other distro like archlinux-ng (formed in June 2021 by users discontented with Arch's conservative "don't update glibc" policy) or Fedora Rawhide exposes such containerd bugs first.

In any event, does this happen on virtualization platforms that provide the latest version of docker/containerd/etc. or did the system you tested this on still distribute a slightly older version of docker?

In other words, is this a new bug not solved by  FS#69563 ?
Comment by Makoto Mizukami (makotom) - Wednesday, 26 May 2021, 17:23 GMT
> In any event, does this happen on virtualization platforms that provide the latest version of docker/containerd/etc. or did the system you tested this on still distribute a slightly older version of docker?
> In other words, is this a new bug not solved by  FS#69563  ?

I was on a slightly older version of Docker and containerd (docker-ce_19.03.15 + containerd.io_1.4.3-1, distributed by Docker for Ubuntu).
I'm not fully confident that I understand the context of  FS#69563 , but I feel it's something related to  FS#69563  and not solved by it.

// Seemingly it was "won't fix" in  FS#69563  because it's in a scope of underlying infrastructure and not in that of Arch...? While I believe lives of container consumers would be much easier if it can be remedied with inner-container approaches, e.g., patches in glibc.)
Comment by freswa (frederik) - Sunday, 20 February 2022, 20:33 GMT
Is this still an issue with glibc 2.35?
Comment by Makoto Mizukami (makotom) - Monday, 21 February 2022, 05:21 GMT
> Is this still an issue with glibc 2.35?

I confirmed the issue itself is applicable to glibc 2.35.

That said, since this happens only with somewhat-old versions of containerd, I won't rule out the option to mark it as Won't Fix stating that users should use Arch Docker images on enough-new infrastructures.
Comment by Giancarlo Razzolini (grazzolini) - Tuesday, 22 February 2022, 19:26 GMT
I think we should probably close this as WONT_FIX. We only support 4 kernels.

Loading...