FS#44848 - [coreutils] tr produces messy nonsense when using UTF-8 characters
Attached to Project:
Arch Linux
Opened by Mathias Steiger (mathiassteiger) - Monday, 04 May 2015, 13:58 GMT
Last edited by Sébastien Luttringer (seblu) - Wednesday, 13 May 2015, 22:43 GMT
Opened by Mathias Steiger (mathiassteiger) - Monday, 04 May 2015, 13:58 GMT
Last edited by Sébastien Luttringer (seblu) - Wednesday, 13 May 2015, 22:43 GMT
|
Details
/usr/bin/tr is owned by coreutils 8.23-1
Example: echo -en "asdgf\nadsfdssdsaf" | tr '\n' '≠' Expected output: asdgf≠adsfdssdsaf Actual output: asdgfâadsfdssdsaf Hex: 7361 6764 e266 6461 6673 7364 6473 6173 0066 Example: echo -en "asdgf\nadsfdssdsaf" | tr 'asd' '≠' Expected output: ≠gf\nadsfdssdsaf Actual output: ≠gf\n⠉f âf Hex: 89e2 67a0 0a66 a0e2 6689 89a0 a089 e289 0066 Example: echo -en "asdgf≠adsfdssdsaf" | tr '≠' '\n' Expected output: asdgf\nadsfdssdsaf Actual output: asdgf\n\nadsfdssdsaf Hex: 7361 6764 0a66 0a0a 6461 6673 7364 6473 6173 0066 This used to work for years now it is all weird. locale settings are irrelevant. Terminal used is irrelevant. Alias set is irrelevant. |
This task depends upon
Closed by Sébastien Luttringer (seblu)
Wednesday, 13 May 2015, 22:43 GMT
Reason for closing: Upstream
Wednesday, 13 May 2015, 22:43 GMT
Reason for closing: Upstream
Did you have reported this bug upstream?
<cut>
Currently ‘tr’ fully supports only single-byte characters.
Eventually it will support multibyte characters; when it does, the ‘-C’
option will cause it to complement the set of characters, whereas ‘-c’
will cause it to complement the set of values. This distinction will
matter only when some values are not characters, and this is possible
only in locales using multibyte encodings when the input contains
encoding errors.
</cut>
If it used to works for years, maybe upstream will accept your report. Nonetheless, I will close it here.