FS#40762 - [pacman] pacman -Qkk wrongly warns about filename with accents

Attached to Project: Pacman
Opened by Alain Kalker (ackalker) - Sunday, 08 June 2014, 21:07 GMT
Last edited by Allan McRae (Allan) - Monday, 04 August 2014, 04:57 GMT
Task Type Bug Report
Category General
Status Closed
Assigned To Allan McRae (Allan)
Architecture All
Severity Critical
Priority Normal
Reported Version 4.1.2
Due in Version 4.2.0
Due Date Undecided
Percent Complete 100%
Votes 0
Private No

Details

Description:

Running `pacman -Qkk` on my system gives the following warning:

warning: atom-editor-git: /usr/share/atom/resources/app/node_modules/markdown-preview/spec/fixtures/subdir/áccéntéd.md (No such file or directory)

The mentioned file does exist:

$ ls -l /usr/share/atom/resources/app/node_modules/markdown-preview/spec/fixtures/subdir/áccéntéd.md
-rw-r--r-- 1 root root 10 Jun 6 17:57 /usr/share/atom/resources/app/node_modules/markdown-preview/spec/fixtures/subdir/áccéntéd.md

(Note that there are accented letters in the filename, I don't know if flyswat will retain them.)

On a hunch, running the command with LC_ALL=C made no difference: same wrong warning.

Additional info:
* package version(s)
pacman 4.1.2-6
* config and/or log files etc.
$ locale
LANG=en_US.UTF-8
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=

Steps to reproduce:
This task depends upon

Closed by  Allan McRae (Allan)
Monday, 04 August 2014, 04:57 GMT
Reason for closing:  Fixed
Additional comments about closing:  fixed for pacman-4.2 with git commit 537a335c
Comment by Alain Kalker (ackalker) - Sunday, 08 June 2014, 21:34 GMT
Hmm, unable to reproduce with this quick'n'dirty PKGBUILD and the attached file from the report:

---
# Quick'n'dirty PKGBUILD for testing  FS#40762 

# Maintainer: Your Name <youremail@domain.com>
pkgname=accenttest
pkgver=0.1
pkgrel=1
pkgdesc="Test for accented filename check bug ( FS#40762 )"
arch=('any')
url=""
license=('GPL')
source=(áccéntéd.md)
md5sums=('fea409ba1e5bfc20b34b6f9053ec164b')

package() {
install -D -m644 "$srcdir/áccéntéd.md" "$pkgdir/usr/share/$pkgname/áccéntéd.md"
}
---

Weird. Could it be the somewhat long pathname? If not, I'm afraid there is no other option for reproducing but to actually build the 'atom-editor-git' package from the AUR and find out... :-)
Comment by Alain Kalker (ackalker) - Sunday, 08 June 2014, 21:45 GMT
Forgot to add: there are more files in the directory containing the offending file, but pacman doesn't warn about those:

$ ls -l /usr/share/atom/resources/app/node_modules/markdown-preview/spec/fixtures/subdir/
total 24
-rw-r--r-- 1 root root 10 Jun 6 17:57 áccéntéd.md
-rw-r--r-- 1 root root 135 Jun 6 17:57 evil.md
-rw-r--r-- 1 root root 340 Jun 6 17:57 file.markdown
-rw-r--r-- 1 root root 10 Jun 6 17:57 file with space.md
-rw-r--r-- 1 root root 21 Jun 6 17:57 html-tag.md
-rw-r--r-- 1 root root 19 Jun 6 17:57 simple.md
Comment by Andrew Gregory (andrewgregory) - Monday, 09 June 2014, 15:35 GMT
The problem appears to be that the accented file name uses combined unicode characters which get normalized in the archive but are preserved in the MTREE file. pacman-git won't even install the file because of the difference.
Comment by Allan McRae (Allan) - Thursday, 12 June 2014, 01:49 GMT
@Alain: What is your system locale? Did you build the problem package yourself?
Comment by Allan McRae (Allan) - Thursday, 12 June 2014, 02:18 GMT
Building the package in the C locale gives a warning:

bsdtar: usr/share/atom/resources/app/node_modules/markdown-preview/spec/fixtures/subdir/áccéntéd.md: Can't translate pathname 'usr/share/atom/resources/app/node_modules/markdown-preview/spec/fixtures/subdir/áccéntéd.md' to UTF-8

But the package will install fine.

This is why we force the C locale in devtools. I'm thinking we should force it in all bsdtar calls in makepkg too.
Comment by Alain Kalker (ackalker) - Thursday, 12 June 2014, 15:11 GMT
I built the package myself, using the locale 'en_US.UTF-8'. Please see the output of `$ locale` which I put at the bottom of the report.

For my test 'accenttest' package from my first comment, I copied the file usr/share/$pkgname/áccéntéd.md into the directory containing the PKGBUILD, then I built, installed and checked the package with `$ pacman -Qkk accenttest`, all without any problems.
Very strange indeed, perhaps the install step in the 'atom-editor-git' package does something strange with locale settings etc.
Comment by Alain Kalker (ackalker) - Thursday, 12 June 2014, 15:13 GMT
oops, usr/share/$pkgname/áccéntéd.md should be /usr/share/atom/resources/app/node_modules/markdown-preview/spec/fixtures/subdir/áccéntéd.md , i.e. the file which failed the check.
Comment by Allan McRae (Allan) - Wednesday, 25 June 2014, 13:09 GMT
@Alain: can you build with "LANG=C makepkg" and then test pacman -Qkk?
Comment by Alain Kalker (ackalker) - Sunday, 20 July 2014, 18:44 GMT
@Allan: sorry for the delay, here are the results of building with "LANG=C makepkg":

"namcap atom-editor-git-20140720-1-x86_64.pkg.tar.xz" shows (among others) the following warning:
atom-editor-git W: File name usr/share/atom/resources/app/node_modules/markdown-preview/spec/fixtures/subdir/áccéntód.md contains non standard characters

After installing, "pacman -Qkk atom-editor-git" shows _no_ errors or warnings:
atom-editor-git: 11544 total files, 0 altered files
Comment by Alain Kalker (ackalker) - Sunday, 20 July 2014, 18:50 GMT
(In my previous post, the second accented e in 'accented' got remapped to a different accented letter somehow, it was correct in the terminal from which I mouse-copy-pasted it. Ah the joys of interoperability...)

Loading...