Arch Linux

Please read this before reporting a bug:
https://wiki.archlinux.org/index.php/Reporting_Bug_Guidelines

Do NOT report bugs when a package is just outdated, or it is in Unsupported. Use the 'flag out of date' link on the package page, or the Mailing List.

REPEAT: Do NOT report bugs for outdated packages!
Tasklist

FS#58544 - [postgresql] systemd kills postgres

Attached to Project: Arch Linux
Opened by Nick (kousu) - Friday, 11 May 2018, 06:02 GMT
Last edited by Levente Polyak (anthraxx) - Friday, 11 May 2018, 18:15 GMT
Task Type Bug Report
Category Packages: Extra
Status Closed
Assigned To Dan McGee (toofishes)
Levente Polyak (anthraxx)
Architecture All
Severity Medium
Priority Normal
Reported Version
Due in Version Undecided
Due Date Undecided
Percent Complete 100%
Votes 6
Private No

Details

Description:

Trying to start postgres 10.4 from systemd now runs for a while but then stops. This was not much fun to run into, since my cursory inspection after pacman -Sy postgresql made it look like everything was okay, but then I found my site was down an hour later.

What's going on is that in /usr/lib/systemd/system/postgresql.service there is

[Service]
Type=notify
TimeoutSec=120

and if you read man systemd.service you'll see that "notify" implies "sd_notify(3) or an equivalent call"---whatever that means---and obviously postgres, if it is calling that, is doing it in the child process and not its master process, which is where the "but reception only permitted for main PID" message in the log (below) is from, so then after 2 minutes **systemd**, the jerk, chucks my perfectly functioning database to the shredder.

You can see in the log that some SQL makes it through to the DB before systemd shuts it down; it's working fine, there's no need for it to die, systemd is just confused.

I didn't plan ahead to compare with what 10.3 had, but I bet it was Type=simple, because if I change postgresql.service to say that and do

systemctl daemon-reload
systemctl start postgresql

then everything works peachily. Was sd_notify() only added to postgres in 10.4? Can the service file be changed back, please?

Additional info:

It seems that sshd got bit by this too: https://bugzilla.redhat.com/show_bug.cgi?id=1427526

Here's my version info:

[root@radio kousu]# pacman -Qi postgresql
Name : postgresql
Version : 10.4-1
Description : Sophisticated object-relational DBMS
Architecture : x86_64
URL : https://www.postgresql.org/
Licenses : custom:PostgreSQL
Groups : None
Provides : None
Depends On : postgresql-libs>=10.4 krb5 libxml2 readline>=6.0 openssl>=1.0.0 pam icu libsystemd libldap
Optional Deps : python2: for PL/Python 2 support [installed]
python: for PL/Python 3 support [installed]
perl: for PL/Perl support [installed]
tcl: for PL/Tcl support
postgresql-old-upgrade: upgrade from previous major version using pg_upgrade
Required By : None
Optional For : None
Conflicts With : None
Replaces : None
Installed Size : 28.40 MiB
Packager : Levente Polyak <anthraxx@archlinux.org>
Build Date : Thu 10 May 2018 08:41:50 PM EDT
Install Date : Fri 11 May 2018 01:00:36 AM EDT
Install Reason : Explicitly installed
Install Script : Yes
Validated By : Signature


[root@radio kousu]# journalctl -f -u postgresql
May 11 01:31:50 radio.kousu.ca systemd[1]: Starting PostgreSQL database server...
May 11 01:31:50 radio.kousu.ca postgres[366]: 2018-05-11 01:31:50.255 EDT [368] LOG: listening on IPv4 address "127.0.0.1", port 5432
May 11 01:31:50 radio.kousu.ca postgres[366]: 2018-05-11 01:31:50.257 EDT [368] LOG: listening on Unix socket "/run/postgresql/.s.PGSQL.5432"
May 11 01:31:50 radio.kousu.ca postgres[366]: 2018-05-11 01:31:50.278 EDT [369] LOG: database system was shut down at 2018-05-11 01:31:50 EDT
May 11 01:31:50 radio.kousu.ca postgres[366]: 2018-05-11 01:31:50.284 EDT [368] LOG: database system is ready to accept connections
May 11 01:31:50 radio.kousu.ca systemd[1]: postgresql.service: Got notification message from PID 368, but reception only permitted for main PID 366
May 11 01:32:32 radio.kousu.ca postgres[366]: 2018-05-11 01:32:32.295 EDT [392] ERROR: duplicate key value violates unique constraint "oc_credentials_pkey"
May 11 01:32:32 radio.kousu.ca postgres[366]: 2018-05-11 01:32:32.295 EDT [392] DETAIL: Key ("user", identifier)=(kousu, password::logincredentials/credentials) already exists.
May 11 01:32:32 radio.kousu.ca postgres[366]: 2018-05-11 01:32:32.295 EDT [392] STATEMENT: INSERT INTO "oc_credentials" ("user", "identifier", "credentials") VALUES($1, $2, $3)
May 11 01:33:50 radio.kousu.ca systemd[1]: postgresql.service: Start operation timed out. Terminating.
May 11 01:33:50 radio.kousu.ca postgres[366]: 2018-05-11 01:33:50.408 EDT [368] LOG: received smart shutdown request
May 11 01:33:50 radio.kousu.ca postgres[366]: 2018-05-11 01:33:50.417 EDT [368] LOG: worker process: logical replication launcher (PID 375) exited with exit code 1
May 11 01:33:50 radio.kousu.ca postgres[366]: 2018-05-11 01:33:50.420 EDT [370] LOG: shutting down
May 11 01:33:50 radio.kousu.ca postgres[366]: 2018-05-11 01:33:50.453 EDT [368] LOG: database system is shut down
May 11 01:33:50 radio.kousu.ca systemd[1]: postgresql.service: Failed with result 'timeout'.
May 11 01:33:50 radio.kousu.ca systemd[1]: Failed to start PostgreSQL database server.

Steps to reproduce:

1) pacman -S postgresql
2) systemctl start postgresql # this will hang for 2 minutes
3) journalctl -f -u postgresql # if you watch this, you can see the messages output above
This task depends upon

Closed by  Levente Polyak (anthraxx)
Friday, 11 May 2018, 18:15 GMT
Reason for closing:  Fixed
Additional comments about closing:  10.4-2
a notify unit variant will come to [testing] via 10.4-3
Comment by Peter Vágner (pvagner) - Friday, 11 May 2018, 09:18 GMT
Most likelly this is related to  FS#55095 
Another way on how to overcome this until the fix ships with updated package is by doing
# systemctl edit postgresql.service
and then pasting into a new empty file

[Service]
NotifyAccess=all
Comment by loqs (loqs) - Friday, 11 May 2018, 11:09 GMT
Mismatch between the main process as recorded by by PIDFile=/var/lib/postgres/data/postmaster.pid and sd_notify.
The main process should be postmaster and there should be no mismatch can someone affected check what process postmaster.pid is pointing to?
Comment by nl6720 (nl6720) - Friday, 11 May 2018, 11:57 GMT
/var/lib/postgres/data/postmaster.pid :
7172


systemctl status -l postgresql.service :
Process: 7168 ExecStartPre=/usr/bin/postgresql-check-db-dir ${PGROOT}/data (code=exited, status=0/SUCCESS)
Main PID: 7172 (postgres)
CGroup: /system.slice/postgresql.service
├─7172 /usr/bin/postgres -D /var/lib/postgres/data
├─7174 postgres: checkpointer process
├─7175 postgres: writer process
├─7176 postgres: wal writer process
├─7177 postgres: autovacuum launcher process
├─7178 postgres: stats collector process
└─7179 postgres: bgworker: logical replication launcher
systemd[1]: postgresql.service: Got notification message from PID 7172, but reception only permitted for main PID 7170

Whatever 7170 was, it doesn't exist anymore.
Comment by rim odoo (rim-odoo) - Friday, 11 May 2018, 14:44 GMT

Loading...