FS#76807 - [Wiki] A space followed by an exclamation mark, broken in code snippets

Attached to Project: Arch Linux
Opened by mpan (mpan) - Monday, 12 December 2022, 08:41 GMT
Last edited by Buggy McBugFace (bugbot) - Saturday, 25 November 2023, 20:22 GMT
Task Type Bug Report
Category Web Sites
Status Closed
Assigned To Pierre Schmitz (Pierre)
Architecture All
Severity Medium
Priority Normal
Reported Version
Due in Version Undecided
Due Date Undecided
Percent Complete 100%
Votes 0
Private No

Details

Sequences of “U+0020 U+0021” (space, exclamation mark) are being replaced with “U+00A0 U+0021” (nbsp, exclamation mark). While that may be desirable for French language, it breaks code snippets.

Affected templates include at least {{ic}}, {{hc}}, {{bc}} and raw <code>.

Behavior observed for firefox-107.0.1-1, konqueror-22.12.0-1, netsurf-3.10-7, elinks-0.15.1-2 and dillo-3.0.5-12.

Not observed in links-2.28-2, lynx-2.8.9-6, falkon-22.12.0-1, seamonkey-2.53.14-2, otter-browser-1.0.03-2, midori-9.0-4, qutebrowser-2.5.2-2, luakit-2.3.3-1, epiphany-43.0-1, eolie-0.9.101-3 and w3m-0.5.3.git20220409_1-2.

Currently the issue may be circumvented by injecting something between the space and exclamation mark, for example a `<i></i>`, or by using an equivalent code snippet, for example enclosing the exclamation mark in quotes for shell.
This task depends upon

Closed by  Buggy McBugFace (bugbot)
Saturday, 25 November 2023, 20:22 GMT
Reason for closing:  Moved
Additional comments about closing:  https://gitlab.archlinux.org/archlinux/p ackaging/bug-repo/issues/2
Comment by Erus (Erus_Iluvatar) - Monday, 12 December 2022, 08:48 GMT Comment by Pierre Schmitz (Pierre) - Wednesday, 14 December 2022, 09:47 GMT
I am not sure what to do here. I doubt this is caused by our custom extensions for MediaWiki.
Comment by mpan (mpan) - Wednesday, 14 December 2022, 14:32 GMT
The cause is on MediaWiki’s end and I see no sane long-term solution other than the upstream changing the behavior or providing convenient means of avoiding it.

A short-term option is Arch applying changes to {{ic}}, {{bc}} and {{hc}}, if such changes are possible. Initially I was thinking about using string replacement to either inject `<i></i>` between the characters or to replace any (U+00A0, U+0021) with (U+0020, U+0021) — depending on the stage at which the offending non-breaking space is being introduced. However, it seems, Arch Wiki has no string replcament templates. With that option not available, I am myself out of ideas.

Otherwise we are stuck with avoiding the issue. In the worst case this bug could be closed after introducing a suitable warning in < https://wiki.archlinux.org/title/Help:Style/Formatting_and_punctuation >. But that would be suboptimal and relying on users to circumvent the bug is likely to fail.

Arch Wiki already has bots, so possibly one of them could take over detecting instances of this issue?
Comment by Erus (Erus_Iluvatar) - Wednesday, 14 December 2022, 15:00 GMT
The issue is also more widespread than just with the exclamation point, it also can be found in combination with ":" or ";" (the latter has shown itself this morning in https://wiki.archlinux.org/index.php?title=Wine&curid=1414&diff=760317&oldid=759672).

Sadly, I don't have any good ideas either to mitigate this.
Comment by nl6720 (nl6720) - Wednesday, 14 December 2022, 15:17 GMT
Unless I'm mistaken, this should be the function in question: https://github.com/archlinux/archwiki/blob/1.38.4-1/includes/parser/Sanitizer.php#L879-L890

Loading...