FS#61179 - Cannot download packages with `+` in the name

Attached to Project: Pacman
Opened by Paul Davis (dangersalad) - Wednesday, 26 December 2018, 19:42 GMT
Last edited by Allan McRae (Allan) - Sunday, 04 December 2022, 06:30 GMT
Task Type Bug Report
Category Backend/Core
Status Closed
Assigned To Anatol Pomozov (anatolik)
Architecture All
Severity Low
Priority Normal
Reported Version 5.1.1
Due in Version Undecided
Due Date Undecided
Percent Complete 100%
Votes 1
Private No

Details

I have a repo set up for my AUR packages and there is an issue when trying to sync the various `libc++` packages.

It seems that `+` is not escaped properly, so the web server treats them as spaces when looking up the url.
This task depends upon

Closed by  Allan McRae (Allan)
Sunday, 04 December 2022, 06:30 GMT
Reason for closing:  No response
Additional comments about closing:  Needs more details
Comment by Eli Schwartz (eschwartz) - Wednesday, 26 December 2018, 19:52 GMT
There are 22 packages in the official repositories which contain '+' in the name. Please provide more details regarding your problem.

$ pacman -Slq core extra community| grep -F +
dvd+rw-tools
foobillard++
libsigc++
libsigc++-docs
libstdc++5
libxml++
libxml++-docs
libxml++2.6
libxml++2.6-docs
memtest86+
timidity++
bonnie++
crypto++
dbus-c++
gtk2+extra
ls++
lucene++
mysql++
nicotine+
png++
tolua++
vsqlite++

There are 176 packages in the official repositories which contain a '+' somewhere in the download filename. Including several packages essentially guaranteed to be on all Arch systems.

EDIT: (did not realize this wasn't already being done, I sort of assumed if we weren't yet escaping this the problem would be reported before and against repo packages)
Comment by Dave Reisner (falconindy) - Wednesday, 26 December 2018, 20:12 GMT
RFC 3986 states:

"If a reserved character is found in a URI component and no delimiting role is known for that character, then it must be interpreted as representing the data octet corresponding to that character's encoding in US-ASCII."

For the path portion of a URI, the path component uses a set of sub-delims that includes the "+" symbol. So, this should be percent encoded. The fact that servers do not seem to unescape the + into a space seems to be an implementation decision, and not as per the standard. Moreover, pacman doesn't prescribe any particular URL syntax, and a package could be fetched from a URL such as: "http://repo.com/package?repo=core&package=libxml++" (with a Content-Disposition header containing the diskfile name).

So, pacman is wrong, even if there's plenty of servers where this happens to work.
Comment by Paul Davis (dangersalad) - Wednesday, 06 February 2019, 22:18 GMT
The server in question for my bug report is a public AWS S3 bucket containing the repo database and packages
Comment by Paul Davis (dangersalad) - Tuesday, 30 July 2019, 19:14 GMT
Tried to hack on this a bit. I made some changes that "work" in the sense that they allow the file to be downloaded, though my C skills are not great so I'm sure this isn't the "correct" way to do this.

   patch (1.1 KiB)
Comment by Allan McRae (Allan) - Saturday, 25 July 2020, 11:12 GMT
@Anatolik: can you have a quick look at this?
Comment by Anatol Pomozov (anatolik) - Tuesday, 28 July 2020, 04:59 GMT
I tried to reproduce this bug with some of the web servers (e.g. http://mirror.f4st.host/archlinux/community/os/x86_64/png++-0.2.10-2-any.pkg.tar.zst) and they handle "+" in the filename as if it is a valid URL. Paul, could you please give me an example of repo that does not handle "+" in the URL? I would like to reproduce the problem with pacman master.

As of your patch - the CURL documentation states "output" need to be freed with curl_free() function. Using different memory allocators (and using FREE() vs curl_free()) is a bit of PITA. I wonder if it would be better just use the default memory allocator + a simple URL escape function like this one https://stackoverflow.com/a/21491633/576557

Loading...