FS#39512 - [dhcpcd] racy polling

Attached to Project: Arch Linux
Opened by Kai Hendry (hendry) - Tuesday, 18 March 2014, 02:48 GMT
Last edited by Anatol Pomozov (anatolik) - Tuesday, 22 July 2014, 08:17 GMT
Task Type Bug Report
Category Packages: Core
Status Closed
Assigned To Ronald van Haren (pressh)
Architecture All
Severity Low
Priority Normal
Reported Version
Due in Version Undecided
Due Date Undecided
Percent Complete 100%
Votes 0
Private No

Details

Description: dhcpcd doesn't like being killed. It also takes 100% of my CPU.
http://s.natalian.org/2014-03-17/1395037018_1364x742.png

I straced it to see a lot of polling http://s.natalian.org/2014-03-18/polling.png


Additional info:
* 6.3.1-2
* config and/or log files etc.

cat /etc/dhcpcd.conf |p
http://ix.io/b8C


Steps to reproduce:
Shutdown

or kill wifi and watch dhcpcd go mad.


I noticed `sudo dhcpcd -x` doesn't work if dhcpcd is run via systemd. `sudo systemctl stop dhcpcd@wlan0.service` takes a long time, since I guess it eventually waits before kill -9 is used.
This task depends upon

Closed by  Anatol Pomozov (anatolik)
Tuesday, 22 July 2014, 08:17 GMT
Reason for closing:  Fixed
Additional comments about closing:  fixed in dhcpcd-6.4.2
Comment by Roy Marples (rsmarples) - Tuesday, 18 March 2014, 03:07 GMT
This should already be fixed in dhcpcd-6.3.2, please update and re-test.
Comment by Kai Hendry (hendry) - Tuesday, 18 March 2014, 03:10 GMT
hmmm, how do I get that package faster? My mirror http://download.nus.edu.sg/mirror/arch/$repo/os/$arch doesn't have it. Thanks,
Comment by Roy Marples (rsmarples) - Tuesday, 18 March 2014, 03:12 GMT
No idea, I don't use arch. This page https://www.archlinux.org/packages/testing/x86_64/dhcpcd/ says it was added 4 days ago.
Comment by Kai Hendry (hendry) - Tuesday, 18 March 2014, 03:26 GMT
Still the same problem. I created a screencast of myself reproducing the issue with dhcpcd-6.3.2. http://r2d2.webconverger.org/2014-03-18/39512.html
Comment by Roy Marples (rsmarples) - Tuesday, 18 March 2014, 10:18 GMT
Could you add the word debug on a line by itself in /etc/dhcpcd.conf please and retest?
Hopefully that will add enough debug into to /var/log/messages (or wherever arch stores its syslog) to diagnose this further.
Comment by Anatol Pomozov (anatolik) - Wednesday, 19 March 2014, 15:50 GMT
@hendry the package was in [testing] repo so you have to download it from http://download.nus.edu.sg/mirror/arch/testing/os/x86_64/

But now the package has been moved to stable. Just update the system and you'll get the new dhcpcd.
Comment by Kai Hendry (hendry) - Thursday, 20 March 2014, 02:58 GMT
I have given up on dhcpcd and I'm now using systemd-networkd. http://dabase.com/blog/Good_riddance_netctl/

Nonetheless I did try debug:
$ tail -n1 /etc/dhcpcd.conf
debug


Mar 20 10:52:26 x220 dhcpcd[16194]: wlan0: IAID 96:ae:2d:3c
Mar 20 10:52:26 x220 dhcpcd[16194]: wlan0: soliciting an IPv6 router
Mar 20 10:52:26 x220 dhcpcd[16194]: wlan0: delaying Router Solicitation for LL address
Mar 20 10:52:26 x220 dhcpcd[16194]: wlan0: using hwaddr 08:11:96:ae:2d:3c
Mar 20 10:52:26 x220 dhcpcd[16194]: wlan0: reading lease `/var/lib/dhcpcd/dhcpcd-wlan0.lease'
Mar 20 10:52:26 x220 dhcpcd[16194]: wlan0: rebinding lease of 192.168.88.249
Mar 20 10:52:26 x220 dhcpcd[16194]: wlan0: sending REQUEST (xid 0x23994c3e), next in 4.3 seconds
Mar 20 10:52:30 x220 dhcpcd[16194]: wlan0: sending REQUEST (xid 0x23994c3e), next in 7.5 seconds
Mar 20 10:52:30 x220 dhcpcd[16194]: wlan0: acknowledged 192.168.88.249 from 192.168.88.1
Mar 20 10:52:30 x220 dhcpcd[16194]: wlan0: leased 192.168.88.249 for 259200 seconds
Mar 20 10:52:30 x220 dhcpcd[16194]: wlan0: renew in 129600 seconds, rebind in 226800 seconds
Mar 20 10:52:30 x220 dhcpcd[16194]: wlan0: IP address 192.168.88.249/24 already exists
Mar 20 10:52:30 x220 dhcpcd[16194]: wlan0: adding route to 192.168.88.0/24
Mar 20 10:52:30 x220 dhcpcd[16194]: wlan0: adding default route via 192.168.88.1
Mar 20 10:52:30 x220 dhcpcd[16194]: wlan0: writing lease `/var/lib/dhcpcd/dhcpcd-wlan0.lease'
Mar 20 10:52:30 x220 dhcpcd[16194]: wlan0: executing `/usr/lib/dhcpcd/dhcpcd-run-hooks' BOUND
Mar 20 10:52:30 x220 dhcpcd[16194]: forking to background

STARTS TO RACE HERE WHEN I TURN OFF WIFI ON H/W SWITCH AND TURN IT BACK ON.

Mar 20 10:52:30 x220 systemd[1]: Started dhcpcd on wlan0.
Mar 20 10:52:30 x220 dhcpcd[16555]: wlan0: sending ARP announce (1 of 2), next in 2.0 seconds
Mar 20 10:52:32 x220 dhcpcd[16555]: wlan0: sending ARP announce (2 of 2)
Mar 20 10:54:45 x220 systemd[1]: Stopping dhcpcd on wlan0...
Mar 20 10:54:45 x220 dhcpcd[23706]: sending signal TERM to pid 16555
Mar 20 10:54:45 x220 dhcpcd[23706]: waiting for pid 16555 to exit
Mar 20 10:54:45 x220 dhcpcd[23706]: dhcpcd[23706]: sending signal TERM to pid 16555
Mar 20 10:54:45 x220 dhcpcd[23706]: dhcpcd[23706]: waiting for pid 16555 to exit
Mar 20 10:54:55 x220 systemd[1]: dhcpcd@wlan0.service: control process exited, code=exited status=1
Mar 20 10:54:55 x220 dhcpcd[23706]: dhcpcd[23706]: pid 16555 failed to exit
Mar 20 10:55:34 x220 systemd[1]: dhcpcd@wlan0.service: main process exited, code=killed, status=9/KILL
Mar 20 10:55:34 x220 systemd[1]: Stopped dhcpcd on wlan0.
Mar 20 10:55:34 x220 systemd[1]: Unit dhcpcd@wlan0.service entered failed state.


6.3.2-1
Comment by Roy Marples (rsmarples) - Thursday, 17 April 2014, 08:51 GMT Comment by Anatol Pomozov (anatolik) - Sunday, 01 June 2014, 17:21 GMT
Could anyone confirm that upstream patch from previous comment fixes this issue?
Comment by Johannes Löthberg (demize) - Tuesday, 03 June 2014, 11:26 GMT
Might want to build a package with it so people who has experienced it can just install it and try without having to build it themselves.
Comment by Ronald van Haren (pressh) - Tuesday, 03 June 2014, 11:33 GMT
I'll add a package to [testing] in a couple of hours.
Comment by Ronald van Haren (pressh) - Tuesday, 03 June 2014, 19:44 GMT
please report if 6.3.2-2 in [testing] fixes it for you
Comment by Marcelo Garlet Millani (mgmillani) - Thursday, 05 June 2014, 10:43 GMT
I've just downloaded the newest version (2014-06-04 23:54) from the repository (http://roy.marples.name/projects/dhcpcd/timeline?r=trunk), installed it with ABS, but the problem persists (at least the wifi-part).
Comment by Roy Marples (rsmarples) - Thursday, 05 June 2014, 16:53 GMT
Can you provide an strace snippet like the OP did to show which fd dhcpcd is spinning on?
I just cannot replicate this, but my Linux machine lacks a wireless card as well.
Comment by Marcelo Garlet Millani (mgmillani) - Friday, 06 June 2014, 07:36 GMT
Here is the part of strace that keeps repeating itself.
Comment by Roy Marples (rsmarples) - Friday, 06 June 2014, 08:09 GMT
Ahhh, it seems that I'm just not checking revents for POLLERR.
After reading the man page I think I can just test for non zero instead of specific flags.... the handlers will take care of things.

Can you test this patch please and let me know if it works? If so I'll commit it right away .... a new dhcpcd release should be done soon and I'd like to include it.
Comment by Marcelo Garlet Millani (mgmillani) - Friday, 06 June 2014, 21:44 GMT
The patch indeed fixes the problem.
Comment by Roy Marples (rsmarples) - Friday, 06 June 2014, 23:04 GMT Comment by Roy Marples (rsmarples) - Monday, 14 July 2014, 12:05 GMT
Fixed in dhcpcd-6.4.1
Comment by Anatol Pomozov (anatolik) - Sunday, 20 July 2014, 14:27 GMT
dhcpcd-6.4.2 is in [testing] right now. Could you please install and test it?

Loading...