FS#61605 - AUR web: Comments with Unicode characters are silently discarded

Attached to Project: AUR web interface
Opened by Alberto Salvia Novella (es20490446e) - Friday, 01 February 2019, 23:31 GMT
Last edited by Lukas Fleischer (lfleischer) - Tuesday, 21 April 2020, 16:07 GMT
Task Type Bug Report
Category General
Status Closed
Assigned To No-one
Architecture All
Severity Low
Priority Normal
Reported Version git
Due in Version Undecided
Due Date Undecided
Percent Complete 100%
Votes 1
Private No

Details

HOW TO REPRODUCE:
- In an AUR package page add a comment with an Unicode pictograph (https://getemoji.com/)

RESULT:
- The comment is silently discarded.
This task depends upon

Closed by  Lukas Fleischer (lfleischer)
Tuesday, 21 April 2020, 16:07 GMT
Reason for closing:  Fixed
Comment by Eli Schwartz (eschwartz) - Sunday, 03 February 2019, 00:38 GMT
That actually sounds like a really cool idea, but unfortunately as far as I can tell, this works fine. I haven't tested on aur.archlinux.org as I have nothing to comment anywhere at the moment and no real interest in bothering people with junk comments just to test this -- but I've trialled it on a local test instance of the aurweb codebase, and I can submit comments with whatever sort of unicode I want.

Comment by Alberto Salvia Novella (es20490446e) - Sunday, 03 February 2019, 03:31 GMT
In the real website it doesn't work. And the comments don't appear, so testing on the web itself has no consequences:
https://youtu.be/M0UlMpA-7pY
Comment by Eli Schwartz (eschwartz) - Sunday, 03 February 2019, 05:02 GMT
I've redacted your offtopic irrelevant attempt at derailing this bug report, and I strongly encourage you to stop picking fights over pacman development in unrelated bug reports. Assuming you know what's good for you.

On the topic of this bug report: if the bug report is correct, there must be something different about the AUR that makes this not work in production -- but the only difference that makes sense is I'm using sqlite and the server is using mariadb. As far as I know mariadb should support unicode just fine, but digging around, the settings look a bit odd:

>>> import aurweb.db
>>> from pprint import pprint
>>> conn = aurweb.db.Connection()
>>> cur = conn.execute("SHOW VARIABLES WHERE Variable_name LIKE 'character\_set\_%' OR Variable_name LIKE 'collation%'")
>>> pprint(cur.fetchall())
[('character_set_client', 'utf8mb4'),
('character_set_connection', 'utf8mb4'),
('character_set_database', 'utf8'),
('character_set_filesystem', 'binary'),
('character_set_results', 'utf8mb4'),
('character_set_server', 'utf8mb4'),
('character_set_system', 'utf8'),
('collation_connection', 'utf8mb4_general_ci'),
('collation_database', 'utf8_general_ci'),
('collation_server', 'utf8mb4_general_ci')]

I will punt to lfleischer on this. IIRC mysql is weird about utf8 which really isn't unless you use the mb4 version.... So it sounds like in order to support annoying people who use unicode emoji in order to communicate serious messages, we might need to change some of these from utf8 to utf8mb4? This would be a database level problem...

I know unicode currently works for most users, at least to the extent that, say, Chinese can be correctly inserted. But those use 3-byte utf8, not 4-byte characters...

Loading...