FS#41785 - {wiki} Corrupted link in wiki email notification

Attached to Project: Arch Linux
Opened by Developer Laptander (laptander) - Monday, 01 September 2014, 19:25 GMT
Last edited by Pierre Schmitz (Pierre) - Friday, 17 September 2021, 10:03 GMT
Task Type Bug Report
Category Web Sites
Status Closed
Assigned To Pierre Schmitz (Pierre)
Architecture All
Severity Medium
Priority Normal
Reported Version
Due in Version Undecided
Due Date Undecided
Percent Complete 100%
Votes 1
Private No

Details

Description:

When I recieve email notifications from Arch Wiki, I cannot go to link, because it is corrupted.

Here is corrupted fragment:

Dear ArchWikiUser,

The ArchWiki page Talk:ArchWiki Translation Team (Русский) has
been changed on August 31, 2014 by Blackx, see
https://wiki.archlinux.org/index.php/Talk:ArchWiki_Translation_Team_(%D0%A0%D1%83%D1%81%D1%81%D0%BA%D0%B8%D0%B9)
for the current revision.


Additional info:
* package version(s)
* config and/or log files etc.


Steps to reproduce:
This task depends upon

Closed by  Pierre Schmitz (Pierre)
Friday, 17 September 2021, 10:03 GMT
Reason for closing:  Upstream
Comment by Developer Laptander (laptander) - Monday, 01 September 2014, 19:31 GMT
I've accidently went back in browser, so bug report was saved. And I cannot edit it for some reason.
I wanted to add: as you can see, the last bracket is not included in link. So I always need to add it by hand manually in browser's adress string.
Additional information: I use mail.ru russian post service.
If I need to give some additional information, ask me.
Comment by Jakub Klinkovský (lahwaacz) - Monday, 01 September 2014, 21:20 GMT
See this upstream bug [1], though it is not mentioned there that the problem occurs only at the end of the URL. Unfortunately, with our naming scheme for i18n pages, ArchWiki is much more prone to this bug than Wikipedia.

[1]: https://bugzilla.wikimedia.org/show_bug.cgi?id=38265
Comment by Developer Laptander (laptander) - Sunday, 28 September 2014, 10:33 GMT
I have also found this https://bugzilla.wikimedia.org/show_bug.cgi?id=21615 bug of 2009 year. So much time left and still there! What can I do for resolving?
Comment by Developer Laptander (laptander) - Sunday, 28 September 2014, 10:40 GMT
It is not only at the end of link. For example, here is another kind of broken link in email:

...
The ArchWiki page Bluetooth Mouse (Русский) has been moved on
26 September 2014 by Blackx, see
https://wiki.archlinux.org/index.php/Bluetooth_Mouse_(%D0%A0%D1%83%D1%81%D1%81%D0%BA%D0%B8%D0%B9)
for the current revision.
...
To delete the page from your watchlist, visit
https://wiki.archlinux.org/index.php?title=Bluetooth_Mouse_(%D0%A0%D1%83%D1%81%D1%81%D0%BA%D0%B8%D0%B9)&action=unwatch

As you can see on screenshot, in this case even opening bracket is not in url.
Comment by Developer Laptander (laptander) - Sunday, 28 September 2014, 17:04 GMT
I saw the source code of the mail, and I think that problem is NOT in the ArchWiki mailer. In email message all problem links are given not with html tags <a href=blablabla>blablabla</a>, but with plain text, like https://blablabla.
The problem is in ALL OTHER applications, that are trying to make that plain text to be a link.

Here is table of what is working or is not working.

-----------------------------------------------------------------------------
Application | Mail.ru | Gmail.com
-----------------------------------------------------------------------------
web interface ff first bracket is out url all ok
web interface chromium first bracket is out url all ok
kmail last bracket is out url last bracket is out url
thunderbird all ok all ok
Email app for Android last bracket is out url last bracket is out url
Gmail app for Android cannot add non-google account all ok


It is strange, that so many programms do the same mistake. If somebody knows where to dig next, please say.
Comment by Jakub Klinkovský (lahwaacz) - Sunday, 28 September 2014, 19:36 GMT
@Andrew Shark: Each (web-)application uses different rules to match url links in plain text. Excluding the closing parenthesis from the _end_ of the url makes sense, it is a fact that _most_ of the urls do not end with parenthesis, which is commonly used in text to separate inserted sentence etc. If you write

...download Arch (from https://www.archlinux.org/download) and...

then the closing parenthesis is not part of the url. Excluding opening parenthesis or any parenthesis at any position makes less sense.
Comment by Developer Laptander (laptander) - Sunday, 28 September 2014, 19:59 GMT
I understand, that in many cases "(" and ")" are a part of sentance. But usually people put space " " at the end of url and then they close parenthesis for make parser to understand that it is not a part of url.

Anyway, we at Archlinux are using such urls. And parser must understand that, especially because we are using it with opening parenthesis.

I think there are some ways to resolve.
First and hard (Best) - make all other apps to work properly. As this is not mediawiki bug, but their.
Second and terrible (Worse) - use "%28" and "%29" instead of "(" and ")".
Third (and maybe not so bad) - use html tags. In this case we can also make links human readable, for example https://wiki.archlinux.org/index.php/Bluetooth_Mouse_(Русский) instead of https://wiki.archlinux.org/index.php/Bluetooth_Mouse_(%D0%A0%D1%83%D1%81%D1%81%D0%BA%D0%B8%D0%B9). As you can see, even here, as I am writing, theese links are wrong interpreted. But in this case some mail clients may not display links due to security reasons.

What do you think should we do?
Comment by Jakub Klinkovský (lahwaacz) - Sunday, 28 September 2014, 20:33 GMT
You can't just patch Gmail or Mail.ru webapps. Patching MediaWiki to encode the misbehaving characters is the simplest and universal solution. Plain text mails are used for simplicity, introducing HTML would probably create more problems than it would solve (personally I would not like it as HTML is not well readable on console).
Comment by Developer Laptander (laptander) - Sunday, 28 September 2014, 20:55 GMT
As I said before, gmail web interface is all ok. As for mail.ru, I have already sent them a message to resolve this problem. So, I think mail.ru web interface will be all ok too soon.
What about kmail: I found the same problems as mine, for example
http://lists.affinix.com/pipermail/psi-devel-affinix.com/2006-June/015168.html - the same problem from 2006
https://bugs.kde.org/show_bug.cgi?id=37833 - the same problem from 2002
https://bugs.kde.org/show_bug.cgi?id=259072 - opposite bug about ")" IS in the url

I will try to make bug report to kde team to solve this, just do it work as thunderbird.

What about html. If you have admin permissions in ArchWiki, maybe you should do it optional in user's settings. Because for you it is difficult to read emails in console, but for me it is discomfortable to see (%D0%A0%D1%83%D1%81%D1%81%D0%BA%D0%B8%D0%B9) instead of (Русский).

By the way, may be we should just add "/" or something to the end of url to make kmail display urls properly?
Comment by Jakub Klinkovský (lahwaacz) - Monday, 29 September 2014, 05:40 GMT
I may be an admin on ArchWiki, but there is nothing I can do about it, because I don't have access to the MediaWiki installation on Arch servers. I have access only to the web interface and mail notifications are not configurable from there. The MediaWiki installation running ArchWiki is mostly vanilla software, so the issue has to be fixed upstream first.

Loading...