FS#79582 - [postgresql] Invalid multibyte char w/ full text search + plperl extension
Attached to Project:
Arch Linux
Opened by Matt (mattarch) - Tuesday, 05 September 2023, 19:09 GMT
Last edited by Toolybird (Toolybird) - Wednesday, 22 November 2023, 00:44 GMT
Opened by Matt (mattarch) - Tuesday, 05 September 2023, 19:09 GMT
Last edited by Toolybird (Toolybird) - Wednesday, 22 November 2023, 00:44 GMT
|
Details
Description:
This problem happens consistently with PostgreSQL 15.4 but has never been seen on 14, even after a clean install of the entire OS on newly formatted partitions. It happens when all 3 conditions are satisfied: 1. TEXT @@ to_tsquery() is executed. 2. The TEXT column contains a multi-byte UTF-8 character. 3. The statement is run after a simple plperl function call in the same session. The error is: ERROR: invalid multibyte character for locale HINT: The server's LC_CTYPE locale is probably incompatible with the database encoding. Additional info: * package version: postgresql 15.4-2 Steps to reproduce: $ createdb -U postgres test $ psql -U postgres test psql (15.4) Type "help" for help. test=# \l test List of databases Name | Owner | Encoding | Collate | Ctype | ICU Locale | Locale Provider | Access privileges ------+----------+----------+-------------+-------------+------------+-----------------+------------------- test | postgres | UTF8 | en_US.UTF-8 | en_US.UTF-8 | | libc | (1 row) test=# CREATE EXTENSION IF NOT EXISTS plperl; CREATE EXTENSION test=# CREATE FUNCTION public.simple_quoter(text) RETURNS TEXT LANGUAGE plperl AS $_X$ return "'$_[0]'" $_X$; CREATE FUNCTION test=# CREATE TABLE t (c TEXT); CREATE TABLE test=# INSERT INTO t VALUES ('Queensrÿche'); INSERT 0 1 test=# SELECT * FROM t WHERE to_tsquery('english','Queen:*') @@ to_tsvector('english',c); ERROR: invalid multibyte character for locale HINT: The server's LC_CTYPE locale is probably incompatible with the database encoding. The same error also occurs when running the two statements in the same session: SELECT simple_quoter('hi'); SELECT c @@ to_tsquery('english','Queen:*') FROM t; If the second is run without the first, there is not problem. So it somehow involves use of the `plperl` extension prior to the second query. There is also an open Stackoverflow issue of the problem: https://stackoverflow.com/questions/77033003/bug-with-postgresql-full-text-search-multi-byte-utf-8-characters-plperl-exte |
This task depends upon
Closed by Toolybird (Toolybird)
Wednesday, 22 November 2023, 00:44 GMT
Reason for closing: No response
Wednesday, 22 November 2023, 00:44 GMT
Reason for closing: No response
I followed your steps and see the same error. But it's hard to see how this is an Arch packaging issue. Have you reported the problem upstream? You might have better luck visiting the proper Arch support channels (Forum/IRC/Mailing Lists/Reddit/etc) to see if anyone has any ideas. Please let us know what you find out.
One possibility is the patch applied to the plperl makefile: https://gitlab.archlinux.org/archlinux/packaging/packages/postgresql/-/blob/main/postgresql-perl-rpath.patch
> Have you reported the problem upstream?
No - but thanks for confirming the issue and the suggestions.
You can test if that is the cause by rebuilding the package with [1] applied to the PKGBUILD. You could also try 16rc1 with [2] applied to the PKGBUILD instead of [1].
You could also attempt to bisect the bug [3].
[1] PKGBUILD.diff.1
[2] PKGBUILD.diff.2
[3] https://wiki.archlinux.org/title/Bisecting_bugs_with_Git
```
diff --git a/postgresql-check-db-dir b/postgresql-check-db-dir
index a2134cd..16d6559 100755
--- a/postgresql-check-db-dir
+++ b/postgresql-check-db-dir
@@ -16,9 +16,9 @@ then
fi
# PGMAJORVERSION is major version
-PGMAJORVERSION=15
+PGMAJORVERSION=16
# PREVMAJORVERSION is the previous major version
-PREVMAJORVERSION=14
+PREVMAJORVERSION=15
# Check for the PGDATA structure
if [ -f "$PGDATA/PG_VERSION" ] && [ -d "$PGDATA/base" ]
```
in order for the server to start via systemd.
Laurenz Albe in the comments of https://stackoverflow.com/questions/77033003/bug-with-postgresql-full-text-search-multi-byte-utf-8-characters-plperl-exte claims to not see the error w/ PostgreSQL v16.
```
export LANG=C
```
to the `check()` function of `PKGBUILD`. Running `makepkg` on the prior tag 15.3-2 fails, ending with
```
============== running regression test queries ==============
test plperl_setup ... ok 45 ms
test plperl ... ok 24 ms
test plperl_lc ... ok 19 ms
test plperl_trigger ... ok 20 ms
test plperl_shared ... ok 20 ms
test plperl_elog ... FAILED 15 ms
test plperl_util ... ok 14 ms
test plperl_init ... ok 21 ms
test plperlu ... ok 10 ms
test plperl_array ... ok 15 ms
test plperl_call ... FAILED 14 ms
test plperl_transaction ... ok 20 ms
test plperl_plperlu ... ok 17 ms
============== shutting down postmaster ==============
=======================
2 of 13 tests failed.
=======================
The differences that caused some tests to fail can be viewed in the
file "/home/matt/gitlab/postgresql/src/postgresql-15.3/src/pl/plperl/regression.diffs". A copy of the test summary that you see
above is saved in the file "/home/matt/gitlab/postgresql/src/postgresql-15.3/src/pl/plperl/regression.out".
make[2]: *** [GNUmakefile:119: check] Error 1
make[2]: Leaving directory '/home/matt/gitlab/postgresql/src/postgresql-15.3/src/pl/plperl'
make[1]: *** [Makefile:35: check-plperl-recurse] Error 2
make[1]: Leaving directory '/home/matt/gitlab/postgresql/src/postgresql-15.3/src/pl'
make: *** [GNUmakefile:71: check-world-src/pl-recurse] Error 2
make check-world failure: ./src/pl/plperl/regression.diffs
diff -U3 /home/matt/gitlab/postgresql/src/postgresql-15.3/src/pl/plperl/expected/plperl_elog_1.out /home/matt/gitlab/postgresql/src/postgresql-15.3/src/pl/plperl/results/plperl_elog.out
--- /home/matt/gitlab/postgresql/src/postgresql-15.3/src/pl/plperl/expected/plperl_elog_1.out 2023-05-08 17:13:20.000000000 -0400
+++ /home/matt/gitlab/postgresql/src/postgresql-15.3/src/pl/plperl/results/plperl_elog.out 2023-09-09 21:05:06.004063285 -0400
@@ -76,6 +76,7 @@
RETURN 1;
END;
$$;
+WARNING: could not determine encoding for locale "en_US.UTF-8": codeset is "ANSI_X3.4-1968"
select die_caller();
NOTICE: caught die
die_caller
diff -U3 /home/matt/gitlab/postgresql/src/postgresql-15.3/src/pl/plperl/expected/plperl_call.out /home/matt/gitlab/postgresql/src/postgresql-15.3/src/pl/plperl/results/plperl_call.out
--- /home/matt/gitlab/postgresql/src/postgresql-15.3/src/pl/plperl/expected/plperl_call.out 2023-05-08 17:13:20.000000000 -0400
+++ /home/matt/gitlab/postgresql/src/postgresql-15.3/src/pl/plperl/results/plperl_call.out 2023-09-09 21:05:06.097397160 -0400
@@ -64,6 +64,7 @@
RAISE NOTICE '_a: %, _b: %', _a, _b;
END
$$;
+WARNING: could not determine encoding for locale "en_US.UTF-8": codeset is "ANSI_X3.4-1968"
NOTICE: a: 10, b:
NOTICE: _a: 10, _b: 20
DROP PROCEDURE test_proc1;
==> ERROR: A failure occurred in check().
Aborting...
```
The test failures involving plperl and locale/encoding are closely related to the issues posted above.
I wonder why `LANG=C` was needed? If it was due to the above errors, perhaps there is a better resolution that would be compatible with environments with `LANG=en_US.UTF-8`?
Clearly, it was done to allow `make check' to pass. What happens if you follow the advice here [1] and/or here [2]?
[1] https://dba.stackexchange.com/questions/65268/how-can-i-load-utf8-into-plperl-by-default
[2] https://stackoverflow.com/questions/20355007/why-does-this-postgres-stored-procedure-want-to-use-utf8
I've tried variations on this theme, based on the above perl5 commit, e.g.
```
plperl.on_init = 'use utf8; use re; package utf8; use Unicode::UCD; require "unicore/UCD.pl";'
```
plus various subsets/permutations of these init commands but they result in the same error.