FS#15526 - rpc returns a null description if $pkgdesc contains unicode? characters

Attached to Project: AUR web interface
Opened by Randy Morris (rson451) - Thursday, 16 July 2009, 00:03 GMT
Last edited by Roman Kyrylych (Romashka) - Saturday, 03 October 2009, 21:27 GMT
Task Type Bug Report
Category Backend
Status Closed
Assigned To No-one
Architecture All
Severity Low
Priority Normal
Reported Version 1.5.6.2
Due in Version Undecided
Due Date Undecided
Percent Complete 100%
Votes 1
Private No

Details

Package 'thunderbird-spell-ru-yo' contains a '£' and when viewing on the AUR interface[1], the pkgdesc shows up. When queried via the rpc interface[2] the description returns null. I have not yet found another package with similar characters in the pkgdesc to reproduce the issue with a seperate package.

A quick look at the rpc code did not immediately reveal the problem.

[1]: http://aur.archlinux.org/packages.php?ID=12575
[2]: http://aur.archlinux.org/rpc.php?type=search&arg=thunderbird-spell-ru-yo
This task depends upon

Closed by  Roman Kyrylych (Romashka)
Saturday, 03 October 2009, 21:27 GMT
Reason for closing:  Fixed
Additional comments about closing:  Seems to be fixed: http://projects.archlinux.org/?p=aur.git ;a=commit;h=4d1eb4dd7ac631138af5e7391eda 1d8f2829f555
Comment by Gergely (imrehg) - Saturday, 08 August 2009, 03:53 GMT
Hi, can you check that your character is really UTF-8? Because on the web interface it does not show up for me for the standard encoding (AUR used UTF-8), but I have to manually change it to Windows-1252...

Also, I'd strongly advise against using non-latin characters anywhere in the PKGBUILDs, whenever it is not absolutely necessary (such as maybe Maintainers' names...)
Comment by Randy Morris (rson451) - Monday, 10 August 2009, 23:57 GMT
To be honest, I just assumed the character was UTF8, I really don't know for sure. I noticed this package when it broke my script I use to search/check for updates/download packages from the aur. I'd suggest not using non-latin characters at all, but there is an issue here that needs to be resolved one way or another.
Comment by Gergely (imrehg) - Tuesday, 11 August 2009, 03:25 GMT
Then try the package using UTF-8 first... Wikipedia is your friend (in this as well): Ё and ё should be the correct letter. I'm just copy-pasting it from the website, so let's see how it looks...
Comment by Randy Morris (rson451) - Tuesday, 08 September 2009, 12:42 GMT
I'm not sure you understand. This is not my package. I can not update the package and see if that fixes it. Either way *something* should be changed so that packages with this or any other invalid character won't break the RPC. I say *something* because I don't really know where the problem lies, the RPC code seems fine to me.
Comment by Gergely (imrehg) - Wednesday, 09 September 2009, 18:07 GMT
I understand indeed.
The usual way of fixing bad packages (as this one is) on AUR is to contact the package manager. Maybe even fix the package yourself (just copy-paste that Ё from here over the wrong one in the PKGBUILD) and send it to them, with some explanation why it does not work. If they does not want to fix (doubt it) or does not reply (possible) then you have other ways to get the necessary changes done. But getting in touch is the first one.

For robustness' sake: sent out fix to the mailing list for the rpc (actually, the aurjson section). Aur tries to convert Non-UTF8 fields, if that fails the output will contain an proper error message in that field.
With this, the above package will return: "Description":"Russian (with \u00a3 [yo]) spellchecker dictionary for Thunderbird"
http://mailman.archlinux.org/pipermail/aur-dev/2009-September/000868.html