FS#61605 : AUR web: Comments with Unicode characters are silently discarded

FS#61605 - AUR web: Comments with Unicode characters are silently discarded

Attached to Project: AUR web interface
Opened by Alberto Salvia Novella (es20490446e) - Friday, 01 February 2019, 23:31 GMT
Last edited by Lukas Fleischer (lfleischer) - Tuesday, 21 April 2020, 16:07 GMT

Task Type	Bug Report
Category	General
Status	Closed
Assigned To	No-one
Architecture	All
Severity	Low
Priority	Normal
Reported Version	git
Due in Version	Undecided
Due Date	Undecided
Percent Complete
Votes	1 Daniel M. Capella (polyzen) (2019-02-03)
Private	No

Details

HOW TO REPRODUCE:
- In an AUR package page add a comment with an Unicode pictograph (https://getemoji.com/)

RESULT:
- The comment is silently discarded.

This task depends upon

Closed by Lukas Fleischer (lfleischer)
Tuesday, 21 April 2020, 16:07 GMT
Reason for closing: Fixed

Comment by Eli Schwartz (eschwartz) - Sunday, 03 February 2019, 00:38 GMT

That actually sounds like a really cool idea, but unfortunately as far as I can tell, this works fine. I haven't tested on aur.archlinux.org as I have nothing to comment anywhere at the moment and no real interest in bothering people with junk comments just to test this -- but I've trialled it on a local test instance of the aurweb codebase, and I can submit comments with whatever sort of unicode I want.

unicode-comments.png (54.2 KiB)

Comment by Alberto Salvia Novella (es20490446e) - Sunday, 03 February 2019, 03:31 GMT

In the real website it doesn't work. And the comments don't appear, so testing on the web itself has no consequences:
https://youtu.be/M0UlMpA-7pY

Comment by Eli Schwartz (eschwartz) - Sunday, 03 February 2019, 05:02 GMT

I've redacted your offtopic irrelevant attempt at derailing this bug report, and I strongly encourage you to stop picking fights over pacman development in unrelated bug reports. Assuming you know what's good for you.

On the topic of this bug report: if the bug report is correct, there must be something different about the AUR that makes this not work in production -- but the only difference that makes sense is I'm using sqlite and the server is using mariadb. As far as I know mariadb should support unicode just fine, but digging around, the settings look a bit odd:

>>> import aurweb.db
>>> from pprint import pprint
>>> conn = aurweb.db.Connection()
>>> cur = conn.execute("SHOW VARIABLES WHERE Variable_name LIKE 'character\_set\_%' OR Variable_name LIKE 'collation%'")
>>> pprint(cur.fetchall())
[('character_set_client', 'utf8mb4'),
('character_set_connection', 'utf8mb4'),
('character_set_database', 'utf8'),
('character_set_filesystem', 'binary'),
('character_set_results', 'utf8mb4'),
('character_set_server', 'utf8mb4'),
('character_set_system', 'utf8'),
('collation_connection', 'utf8mb4_general_ci'),
('collation_database', 'utf8_general_ci'),
('collation_server', 'utf8mb4_general_ci')]

I will punt to lfleischer on this. IIRC mysql is weird about utf8 which really isn't unless you use the mb4 version.... So it sounds like in order to support annoying people who use unicode emoji in order to communicate serious messages, we might need to change some of these from utf8 to utf8mb4? This would be a database level problem...

I know unicode currently works for most users, at least to the extent that, say, Chinese can be correctly inserted. But those use 3-byte utf8, not 4-byte characters...

	Tasks related to this task (0)

Duplicate tasks of this task (0)

Arch Linux

FS#61605 - AUR web: Comments with Unicode characters are silently discarded

Details

Loading...