FS#40302 - [ntp] ntpd crash at start with pool directive

Attached to Project: Arch Linux
Opened by Sébastien Luttringer (seblu) - Sunday, 11 May 2014, 00:37 GMT
Last edited by Gaetan Bisson (vesath) - Saturday, 07 June 2014, 17:04 GMT
Task Type Bug Report
Category Packages: Extra
Status Closed
Assigned To Gaetan Bisson (vesath)
Architecture All
Severity High
Priority High
Reported Version
Due in Version Undecided
Due Date Undecided
Percent Complete 100%
Votes 3
Private No

Details

With version 4.2.7.p441-1 (probably the dev branch), ntpd is crashing at start. Downgrading and ntpd works again.

Here is the systemd output. The key ipthread_create() blocking child: Resource temporarily unavailables: "pthread_create() blocking child: Resource temporarily unavailable"
mai 11 02:20:41 achille.seblu.net systemd[1]: Unit ntpd.service entered failed state.
mai 11 02:20:41 achille.seblu.net systemd[1]: Failed to start Network Time Service.
mai 11 02:20:41 achille.seblu.net systemd[1]: ntpd.service start request repeated too quickly, refusing to start.
mai 11 02:20:41 achille.seblu.net systemd[1]: Starting Network Time Service...
mai 11 02:20:41 achille.seblu.net systemd[1]: Stopping Network Time Service...
mai 11 02:20:41 achille.seblu.net systemd[1]: ntpd.service holdoff time over, scheduling restart.
mai 11 02:20:41 achille.seblu.net systemd[1]: Unit ntpd.service entered failed state.
mai 11 02:20:41 achille.seblu.net systemd[1]: ntpd.service: main process exited, code=exited, status=1/FAILURE
mai 11 02:20:41 achille.seblu.net ntpd[12718]: pthread_create() blocking child: Resource temporarily unavailable
mai 11 02:20:40 achille.seblu.net systemd[1]: Started Network Time Service.
mai 11 02:20:40 achille.seblu.net ntpd[12718]: Listening on routing socket on fd #29 for interface updates
mai 11 02:20:40 achille.seblu.net ntpd[12718]: Listen normally on 12 s6nt [2001:bc8:3173:101::1]:123
mai 11 02:20:40 achille.seblu.net ntpd[12718]: Listen normally on 11 eth0 [fe80::a2b3:ccff:feed:615a%2]:123
mai 11 02:20:40 achille.seblu.net ntpd[12718]: Listen normally on 10 eth0 [3b:c400::a2b3:ccff:feed:615a]:123
mai 11 02:20:40 achille.seblu.net ntpd[12718]: Listen normally on 9 eth0 [a0fe:1801::a2b3:ccff:feed:615a]:123
mai 11 02:20:40 achille.seblu.net ntpd[12718]: Listen normally on 8 eth0 [2001:bc8:3173:100:a2b3:ccff:feed:615a]:123
mai 11 02:20:40 achille.seblu.net ntpd[12718]: Listen normally on 7 eth0 [e07a:7901::a2b3:ccff:feed:615a]:123
mai 11 02:20:40 achille.seblu.net ntpd[12718]: Listen normally on 6 lo [::1]:123
mai 11 02:20:40 achille.seblu.net ntpd[12718]: Listen normally on 5 spnt 192.168.244.15:123
mai 11 02:20:40 achille.seblu.net ntpd[12718]: Listen normally on 4 s6nt 192.168.246.1:123
mai 11 02:20:40 achille.seblu.net ntpd[12718]: Listen normally on 3 eth0 62.210.204.83:123
mai 11 02:20:40 achille.seblu.net ntpd[12718]: Listen normally on 2 lo 127.0.0.1:123
mai 11 02:20:40 achille.seblu.net ntpd[12718]: Listen and drop on 1 v4wildcard 0.0.0.0:123
mai 11 02:20:40 achille.seblu.net ntpd[12718]: Listen and drop on 0 v6wildcard [::]:123
mai 11 02:20:40 achille.seblu.net ntpd[12718]: proto: precision = 0.051 usec (-24)
mai 11 02:20:40 achille.seblu.net systemd[1]: PID file /run/ntpd.pid not readable (yet?) after start.
mai 11 02:20:40 achille.seblu.net ntpd[12717]: ntpd 4.2.7p441@1.2483-o Tue May 6 10:14:26 UTC 2014 (1): Starting
mai 11 02:20:40 achille.seblu.net systemd[1]: Starting Network Time Service...

After a quick googling I found this: http://marc.info/?l=ntp-bugs&m=135871530509275&w=2. Maybe it can help.

Cheers,
This task depends upon

Closed by  Gaetan Bisson (vesath)
Saturday, 07 June 2014, 17:04 GMT
Reason for closing:  Upstream
Comment by Gaetan Bisson (vesath) - Sunday, 11 May 2014, 05:41 GMT
Thanks. Would you mind reporting this upstream?
Comment by Kai Hildebrandt (derhil) - Sunday, 11 May 2014, 12:03 GMT
Same here, ntpd reports:

Mai 11 14:01:43 server.derhil.de ntpd[5788]: out of memory [5788]
Mai 11 14:01:43 server.derhil.de systemd[1]: ntpd.service: main process exited, code=exited, status=1/FAILURE
Mai 11 14:01:43 server.derhil.de systemd[1]: Unit ntpd.service entered failed state.
Mai 11 14:01:43 server.derhil.de systemd[1]: ntpd.service holdoff time over, scheduling restart.
Mai 11 14:01:43 server.derhil.de systemd[1]: Stopping Network Time Service...

Cheers, Kai.
Comment by Gaetan Bisson (vesath) - Sunday, 11 May 2014, 19:09 GMT
That's all very good but since I cannot reproduce the issue we need to find the difference between my machines and yours. For starters, do you use the stock ntp.conf?
Comment by Kai Hildebrandt (derhil) - Monday, 12 May 2014, 08:53 GMT
No, i dont use the stock ntp.conf.

My ntp.conf (stratum 2 on my server):

server ntp1.fau.de
server ntp2.fau.de
server ptbtime1.ptb.de
server ptbtime2.ptb.de
server zeit.fu-berlin.de

driftfile /var/lib/ntp/ntp.drift
pidfile /run/ntpd.pid

restrict default kod limited nomodify notrap nopeer noquery
restrict -6 default kod limited nomodify notrap nopeer noquery

restrict 127.0.0.1
restrict -6 ::1

By the way, "ntpd" from the command line works fine.
Comment by Sébastien Luttringer (seblu) - Monday, 12 May 2014, 10:04 GMT
Me neither
   ntp.conf (0.2 KiB)
Comment by Gaetan Bisson (vesath) - Monday, 12 May 2014, 16:46 GMT
Please keep helping me help you:
- If ntpd works fine when ran interactively, then the issue is with its systemd service status; what if you remove lines from the service files? There is another bug report which sounds incredible but perhaps is related: https://bugs.archlinux.org/task/40319
- It would help to come up with a minimal non-working example for ntp.conf, so that we can identify precisely which option breaks with ntp/systemd.
Comment by Sébastien Luttringer (seblu) - Monday, 12 May 2014, 21:55 GMT
- Using a stock ntp.conf works fine. Once I replace server by pool, ntpd crash.
- I use a stock systemd service file.
- Launching with or wihout systemd doesn't affect *my* issue. Attached the output from command line.
Comment by Methos (Methos) - Tuesday, 13 May 2014, 20:15 GMT
Maybe it would be interesting, if the users with the problem are using the default kernel or the grsecurity patched kernel. I recognized a similar problem when using the grsecurity kernel. Everthing is fine with default kernel and ntp-4.2.7, but also fine with ntp-4.2.6 and grsecurity kernel. Talked about in the forum:
https://bbs.archlinux.org/viewtopic.php?pid=1415198#p1415198
Comment by Gaetan Bisson (vesath) - Wednesday, 14 May 2014, 02:39 GMT
Sébastien: So it indeed appears that the `pool` directive is currently broken. Would you mind reporting this upstream?

Methos: To hell with grsecurity and the forums (people go there to complain; if they wanted to report bugs and see them fixed, they'd be here on the tracker).
Comment by daimonion (demon) - Friday, 06 June 2014, 09:47 GMT
This is fixed with latest grsec kernel.
Comment by Sébastien Luttringer (seblu) - Saturday, 07 June 2014, 11:22 GMT
# pacman -Q ntp
ntp 4.2.7.p441-1
# systemctl status ntpd
● ntpd.service - Network Time Service
Loaded: loaded (/usr/lib/systemd/system/ntpd.service; enabled)
Active: failed (Result: start-limit) since sam. 2014-06-07 13:12:23 CEST; 3s ago
Process: 28868 ExecStart=/usr/bin/ntpd -g -u ntp:ntp -p /run/ntpd.pid (code=exited, status=0/SUCCESS)
Main PID: 28869 (code=exited, status=1/FAILURE)

juin 07 13:12:23 achille.seblu.net systemd[1]: Unit ntpd.service entered failed state.
juin 07 13:12:23 achille.seblu.net systemd[1]: ntpd.service holdoff time over, scheduling restart.
juin 07 13:12:23 achille.seblu.net systemd[1]: Stopping Network Time Service...
juin 07 13:12:23 achille.seblu.net systemd[1]: Starting Network Time Service...
juin 07 13:12:23 achille.seblu.net systemd[1]: ntpd.service start request repeated too quickly, refusing to start.
juin 07 13:12:23 achille.seblu.net systemd[1]: Failed to start Network Time Service.
juin 07 13:12:23 achille.seblu.net systemd[1]: Unit ntpd.service entered failed state.


@daimonion: Do you know how a kernel release could be fix this?
Comment by Sébastien Luttringer (seblu) - Saturday, 07 June 2014, 11:51 GMT
I tested with p444 and it fail too.

# /usr/bin/ntpd -g -u ntp:ntp -p /run/ntpd.pid -d
7 Jun 13:30:14 ntpd[31780]: ntpd 4.2.7p444@1.2483-o Sat Jun 7 11:26:44 UTC 2014 (1): Starting
7 Jun 13:30:14 ntpd[31780]: Command line: /usr/bin/ntpd -g -u ntp:ntp -p /run/ntpd.pid -d
7 Jun 13:30:14 ntpd[31780]: proto: precision = 0.113 usec (-23)
...
newpeer: <null>->0.0.0.0 mode 3 vers 4 poll 6 10 flags 0x101 0x8 ttl 0 key 00000000
event at 0 0.0.0.0 c01d 0d kern kernel time sync enabled
event at 0 0.0.0.0 c012 02 freq_set kernel 21.796 PPM
event at 0 0.0.0.0 c016 06 restart
select() returned -1: Interrupted system call
7 Jun 13:30:15 ntpd[31780]: pthread_create() blocking child: Resource temporarily unavailable
event at 1 0.0.0.0 c01d 0d kern kernel time sync disabled

@gaetan: I reported the bug to David L Mills, as explained here: http://ntp.org/bugs.html

@daimonion: I made a test with linux-grec and it doesn't works.
# uname -a
Linux leonard 3.14.5.201406051310-1-grsec #1 SMP PREEMPT Thu Jun 5 15:10:52 EDT 2014 x86_64 GNU/Linux
# /usr/bin/ntpd -g -u ntp:ntp -p /run/ntpd.pid -d
7 Jun 13:51:01 ntpd[2819]: ntpd 4.2.7p441@1.2483-o Tue May 6 10:14:26 UTC 2014 (1): Starting
...
event at 0 0.0.0.0 c016 06 restart
select() returned -1: Interrupted system call
7 Jun 13:51:02 ntpd[2819]: pthread_create() blocking child: Resource temporarily unavailable
event at 1 0.0.0.0 c01d 0d kern kernel time sync disabled

Comment by daimonion (demon) - Saturday, 07 June 2014, 13:35 GMT
Sébastien, all I know is that with linux-grsec 3.14.5.201406021708-1 ntpd didn't work, but it does work with linux-grsec 3.14.5.201406051310-1 (no other changes made except for paxd package upgrade, but I doubt it's related). Just to note, I keep my RTC in UTC.

# systemctl status ntpd
● ntpd.service - Network Time Service
Loaded: loaded (/usr/lib/systemd/system/ntpd.service; enabled)
Active: active (running) since суб 2014-06-07 11:04:59 CEST; 4h 29min ago
Main PID: 235 (ntpd)
CGroup: /system.slice/ntpd.service
└─235 /usr/bin/ntpd -g -u ntp:ntp -p /run/ntpd.pid

Edit: However, we might be talking about two different issues, since I used to get "cap_set_proc() failed to drop root privs: Operation not permitted" in ntpd.log before upgrade to latest grsec.
Comment by Sébastien Luttringer (seblu) - Saturday, 07 June 2014, 15:35 GMT
Are you using pool directive in your ntp.conf?
Comment by Daniel Micay (thestinger) - Saturday, 07 June 2014, 15:44 GMT
This thread seems to have been hijacked with an unrelated incompatibility between an optional grsecurity feature (kernel.grsecurity.consistent_setxid sysctl option) and ntpd. It's no longer incompatible because the feature was altered, but it was never on topic here and was never a bug.

If you want to ask a question about grsecurity, please send me an email (danielmicay@gmail.com), message me on freenode (thestinger) or at least file it as a linux-grsec bug if something works with the vanilla kernel and not the grsecurity one.

An incompatibility with a package is not a bug in that package, it's either a bug in the linux-grsec/paxd packages or an issue of missing documentation. In extremely rare cases it could be uncovering a bug in the package (for example, ld.bfd in binutils currently messes up PIE relocations) but you shouldn't make that assumption.
Comment by Gaetan Bisson (vesath) - Saturday, 07 June 2014, 17:04 GMT
In the future, please do not hijack bugs: it's easier for me to close two bugs that matter only to one person each, than close one bug that seems to matter to two different people. :)
Anyhow, there is not much we can do about the remaining bug in our package for now, but please bug me again when a fix comes up. Cheers.

Loading...