Please read this before reporting a bug:
https://wiki.archlinux.org/title/Bug_reporting_guidelines
Do NOT report bugs when a package is just outdated, or it is in the AUR. Use the 'flag out of date' link on the package page, or the Mailing List.
REPEAT: Do NOT report bugs for outdated packages!
https://wiki.archlinux.org/title/Bug_reporting_guidelines
Do NOT report bugs when a package is just outdated, or it is in the AUR. Use the 'flag out of date' link on the package page, or the Mailing List.
REPEAT: Do NOT report bugs for outdated packages!
FS#58544 - [postgresql] systemd kills postgres
Attached to Project:
Arch Linux
Opened by Nick (kousu) - Friday, 11 May 2018, 06:02 GMT
Last edited by Levente Polyak (anthraxx) - Friday, 11 May 2018, 18:15 GMT
Opened by Nick (kousu) - Friday, 11 May 2018, 06:02 GMT
Last edited by Levente Polyak (anthraxx) - Friday, 11 May 2018, 18:15 GMT
|
DetailsDescription:
Trying to start postgres 10.4 from systemd now runs for a while but then stops. This was not much fun to run into, since my cursory inspection after pacman -Sy postgresql made it look like everything was okay, but then I found my site was down an hour later. What's going on is that in /usr/lib/systemd/system/postgresql.service there is [Service] Type=notify TimeoutSec=120 and if you read man systemd.service you'll see that "notify" implies "sd_notify(3) or an equivalent call"---whatever that means---and obviously postgres, if it is calling that, is doing it in the child process and not its master process, which is where the "but reception only permitted for main PID" message in the log (below) is from, so then after 2 minutes **systemd**, the jerk, chucks my perfectly functioning database to the shredder. You can see in the log that some SQL makes it through to the DB before systemd shuts it down; it's working fine, there's no need for it to die, systemd is just confused. I didn't plan ahead to compare with what 10.3 had, but I bet it was Type=simple, because if I change postgresql.service to say that and do systemctl daemon-reload systemctl start postgresql then everything works peachily. Was sd_notify() only added to postgres in 10.4? Can the service file be changed back, please? Additional info: It seems that sshd got bit by this too: https://bugzilla.redhat.com/show_bug.cgi?id=1427526 Here's my version info: [root@radio kousu]# pacman -Qi postgresql Name : postgresql Version : 10.4-1 Description : Sophisticated object-relational DBMS Architecture : x86_64 URL : https://www.postgresql.org/ Licenses : custom:PostgreSQL Groups : None Provides : None Depends On : postgresql-libs>=10.4 krb5 libxml2 readline>=6.0 openssl>=1.0.0 pam icu libsystemd libldap Optional Deps : python2: for PL/Python 2 support [installed] python: for PL/Python 3 support [installed] perl: for PL/Perl support [installed] tcl: for PL/Tcl support postgresql-old-upgrade: upgrade from previous major version using pg_upgrade Required By : None Optional For : None Conflicts With : None Replaces : None Installed Size : 28.40 MiB Packager : Levente Polyak <anthraxx@archlinux.org> Build Date : Thu 10 May 2018 08:41:50 PM EDT Install Date : Fri 11 May 2018 01:00:36 AM EDT Install Reason : Explicitly installed Install Script : Yes Validated By : Signature [root@radio kousu]# journalctl -f -u postgresql May 11 01:31:50 radio.kousu.ca systemd[1]: Starting PostgreSQL database server... May 11 01:31:50 radio.kousu.ca postgres[366]: 2018-05-11 01:31:50.255 EDT [368] LOG: listening on IPv4 address "127.0.0.1", port 5432 May 11 01:31:50 radio.kousu.ca postgres[366]: 2018-05-11 01:31:50.257 EDT [368] LOG: listening on Unix socket "/run/postgresql/.s.PGSQL.5432" May 11 01:31:50 radio.kousu.ca postgres[366]: 2018-05-11 01:31:50.278 EDT [369] LOG: database system was shut down at 2018-05-11 01:31:50 EDT May 11 01:31:50 radio.kousu.ca postgres[366]: 2018-05-11 01:31:50.284 EDT [368] LOG: database system is ready to accept connections May 11 01:31:50 radio.kousu.ca systemd[1]: postgresql.service: Got notification message from PID 368, but reception only permitted for main PID 366 May 11 01:32:32 radio.kousu.ca postgres[366]: 2018-05-11 01:32:32.295 EDT [392] ERROR: duplicate key value violates unique constraint "oc_credentials_pkey" May 11 01:32:32 radio.kousu.ca postgres[366]: 2018-05-11 01:32:32.295 EDT [392] DETAIL: Key ("user", identifier)=(kousu, password::logincredentials/credentials) already exists. May 11 01:32:32 radio.kousu.ca postgres[366]: 2018-05-11 01:32:32.295 EDT [392] STATEMENT: INSERT INTO "oc_credentials" ("user", "identifier", "credentials") VALUES($1, $2, $3) May 11 01:33:50 radio.kousu.ca systemd[1]: postgresql.service: Start operation timed out. Terminating. May 11 01:33:50 radio.kousu.ca postgres[366]: 2018-05-11 01:33:50.408 EDT [368] LOG: received smart shutdown request May 11 01:33:50 radio.kousu.ca postgres[366]: 2018-05-11 01:33:50.417 EDT [368] LOG: worker process: logical replication launcher (PID 375) exited with exit code 1 May 11 01:33:50 radio.kousu.ca postgres[366]: 2018-05-11 01:33:50.420 EDT [370] LOG: shutting down May 11 01:33:50 radio.kousu.ca postgres[366]: 2018-05-11 01:33:50.453 EDT [368] LOG: database system is shut down May 11 01:33:50 radio.kousu.ca systemd[1]: postgresql.service: Failed with result 'timeout'. May 11 01:33:50 radio.kousu.ca systemd[1]: Failed to start PostgreSQL database server. Steps to reproduce: 1) pacman -S postgresql 2) systemctl start postgresql # this will hang for 2 minutes 3) journalctl -f -u postgresql # if you watch this, you can see the messages output above |
This task depends upon
Closed by Levente Polyak (anthraxx)
Friday, 11 May 2018, 18:15 GMT
Reason for closing: Fixed
Additional comments about closing: 10.4-2
a notify unit variant will come to [testing] via 10.4-3
Friday, 11 May 2018, 18:15 GMT
Reason for closing: Fixed
Additional comments about closing: 10.4-2
a notify unit variant will come to [testing] via 10.4-3
FS#55095Another way on how to overcome this until the fix ships with updated package is by doing
# systemctl edit postgresql.service
and then pasting into a new empty file
[Service]
NotifyAccess=all
The main process should be postmaster and there should be no mismatch can someone affected check what process postmaster.pid is pointing to?
7172
systemctl status -l postgresql.service :
Process: 7168 ExecStartPre=/usr/bin/postgresql-check-db-dir ${PGROOT}/data (code=exited, status=0/SUCCESS)
Main PID: 7172 (postgres)
CGroup: /system.slice/postgresql.service
├─7172 /usr/bin/postgres -D /var/lib/postgres/data
├─7174 postgres: checkpointer process
├─7175 postgres: writer process
├─7176 postgres: wal writer process
├─7177 postgres: autovacuum launcher process
├─7178 postgres: stats collector process
└─7179 postgres: bgworker: logical replication launcher
systemd[1]: postgresql.service: Got notification message from PID 7172, but reception only permitted for main PID 7170
Whatever 7170 was, it doesn't exist anymore.