FS#24113 - [initscripts] dhcpcd time out

Attached to Project: Arch Linux
Opened by tiny (tiny) - Thursday, 05 May 2011, 12:58 GMT
Last edited by Tom Gundersen (tomegun) - Wednesday, 18 May 2011, 15:39 GMT
Task Type Bug Report
Category System
Status Closed
Assigned To Tom Gundersen (tomegun)
Architecture i686
Severity High
Priority Normal
Reported Version
Due in Version Undecided
Due Date Undecided
Percent Complete 100%
Votes 0
Private No

Details

Description:
NIC fails to get IP over DHCP on startup.
Works after executing "/etc/rc.d/network ifup eth1"

Maybe it's related to another issue I'm having.
Udev fails to set permanent net rules. Again after login
and module reloading rules are magically created.

Still, after reboot eth1 fails to come up.


Additional info:
* package version(s)
dhcpcd 5.2.12-1
udev 167-1
kernel26 2.6.38.4-1

* config and/or log files etc.
rc.conf...
#Static IP example
#eth0="eth0 192.168.0.2 netmask 255.255.255.0 broadcast 192.168.0.255"
eth1="dhcp"
INTERFACES=(eth1)

# Routes to start at boot-up (in this order)
# Declare each route then list in ROUTES
# - prefix an entry in ROUTES with a ! to disable it
#
gateway="default gw 192.168.0.1"
ROUTES=(!gateway)

# Setting this to "yes" will skip network shutdown.
# This is required if your root device is on NFS.
NETWORK_PERSIST="no"

# Enable these network profiles at boot-up. These are only useful
# if you happen to need multiple network configurations (ie, laptop users)
# - set to 'menu' to present a menu during boot-up (dialog package required)
# - prefix an entry with a ! to disable it
#
# Network profiles are found in /etc/network.d
#
# This now requires the netcfg package
#
#NETWORKS=(main)

# -----------------------------------------------------------------------
# DAEMONS
# -----------------------------------------------------------------------
#
# Daemons to start at boot-up (in this order)
# - prefix a daemon with a ! to disable it
# - prefix a daemon with a @ to start it up in the background
#
# If something other takes care of your hardware clock (ntpd, dual-boot...)
# you should disable 'hwclock' here.
#
DAEMONS=(hwclock syslog-ng network netfs crond sshd)



/var/log/everything
May 5 14:21:43 websrv kernel: [ 11.541821] e1000: eth1 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX
May 5 14:21:43 websrv dhcpcd[1449]: eth1: rebinding lease of 192.168.254.98
May 5 14:21:53 websrv dhcpcd[1449]: eth1: broadcasting for a lease
May 5 14:22:13 websrv dhcpcd[1449]: timed out
May 5 14:22:13 websrv crond[1473]: /usr/sbin/crond 4.4 dillon's cron daemon, started with loglevel info
May 5 14:22:14 websrv kernel: [ 42.508520] NET: Registered protocol family 10
May 5 14:22:25 websrv kernel: [ 53.216668] eth1: no IPv6 routers present


Steps to reproduce:
Install fresh ArchLinux. Configure NIC for dhcp. Boot.
This task depends upon

Closed by  Tom Gundersen (tomegun)
Wednesday, 18 May 2011, 15:39 GMT
Reason for closing:  Fixed
Additional comments about closing:  Probably fixed by udev-168-2. Please reopen if it reoccurs.
Comment by Tom Gundersen (tomegun) - Thursday, 05 May 2011, 17:04 GMT
What is the output of "ifconfig -a"? If you have several nic's, how do you know that eth1 is the right one on a fresh install?
Comment by tiny (tiny) - Friday, 06 May 2011, 04:10 GMT
Ifconfig reports that eth1 is "RUNNING" ie link is present. There's no IP set of course since dhcp times out.
Interface _gets_ set if I issue a command "etc/rc.d/network ifup eth1" as I stated above.
Reboot the machine or restart the network and I loose interface again!

ifconfig after NIC is set:

[root@websrv ~]# ifconfig
eth1 Link encap:Ethernet HWaddr 00:13:20:31:51:C9
inet addr:192.168.254.98 Bcast:192.168.254.255 Mask:255.255.255.0
inet6 addr: fe80::213:20ff:fe31:51c9/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:338230 errors:0 dropped:338 overruns:0 frame:0
TX packets:482 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:22918883 (21.8 Mb) TX bytes:71254 (69.5 Kb)

lo Link encap:Local Loopback
inet addr:127.0.0.1 Mask:255.0.0.0
inet6 addr: ::1/128 Scope:Host
UP LOOPBACK RUNNING MTU:16436 Metric:1
RX packets:4 errors:0 dropped:0 overruns:0 frame:0
TX packets:4 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:264 (264.0 b) TX bytes:264 (264.0 b)

[root@websrv ~]#




[root@websrv ~]# ifconfig -a
eth0 Link encap:Ethernet HWaddr 00:13:20:31:51:CA
BROADCAST MULTICAST MTU:1500 Metric:1
RX packets:0 errors:0 dropped:0 overruns:0 frame:0
TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:0 (0.0 b) TX bytes:0 (0.0 b)

eth1 Link encap:Ethernet HWaddr 00:13:20:31:51:C9
inet addr:192.168.254.98 Bcast:192.168.254.255 Mask:255.255.255.0
inet6 addr: fe80::213:20ff:fe31:51c9/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:339105 errors:0 dropped:338 overruns:0 frame:0
TX packets:492 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:22978387 (21.9 Mb) TX bytes:73482 (71.7 Kb)

eth2 Link encap:Ethernet HWaddr 00:40:F4:BA:DF:82
BROADCAST MULTICAST MTU:1500 Metric:1
RX packets:0 errors:0 dropped:0 overruns:0 frame:0
TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:0 (0.0 b) TX bytes:0 (0.0 b)
Interrupt:21 Base address:0x4c00

eth3 Link encap:Ethernet HWaddr 00:40:F4:BA:DF:6A
BROADCAST MULTICAST MTU:1500 Metric:1
RX packets:0 errors:0 dropped:0 overruns:0 frame:0
TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:0 (0.0 b) TX bytes:0 (0.0 b)
Interrupt:22 Base address:0x4800

lo Link encap:Local Loopback
inet addr:127.0.0.1 Mask:255.0.0.0
inet6 addr: ::1/128 Scope:Host
UP LOOPBACK RUNNING MTU:16436 Metric:1
RX packets:4 errors:0 dropped:0 overruns:0 frame:0
TX packets:4 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:264 (264.0 b) TX bytes:264 (264.0 b)

[root@websrv ~]#
Comment by Tom Gundersen (tomegun) - Sunday, 08 May 2011, 00:09 GMT
Does applying my patch from your other bug report fix also this issue?
Comment by tiny (tiny) - Monday, 09 May 2011, 06:50 GMT
Unfortunately no. That patch didn't generate 70-persistent-net.rules file.

Even if it would I somehow doubt it would help since that NIC always
defaults to `eth1' (RUNNING). I only have one UTP plugged in and it's plugged
into correct NIC.
I don't see how correct idev rules would fix getting IP over dhcpcd.

ethtool output:
http://paste.pocoo.org/show/385660/
Comment by tiny (tiny) - Monday, 09 May 2011, 13:23 GMT
It's confirmed. This bug is not related to
 FS#24115  - [initscripts][udev]70-persistent-net.rules is not generated

I'm still left with uninitialized NIC that's set over dhcp.
Comment by Tom Gundersen (tomegun) - Monday, 09 May 2011, 13:45 GMT
@tiny: thanks for checking. I only thought it might be related if someohw the names of your network devices got swapped, but I guess that would not make sense (as you manage to make it work manually).

When is the last confirmed version of initscripts that work? Did you use this setup with 2011.02.1? I have looked through all the commits to the network logic between 2011.02.1 and 2011.04.1 and nothing stands out. The only related change is the parsing of the interface name, but as far as I can tell this is not the problem.

Are you familiar with git? If so, can I ask you to clone <git://projects.archlinux.org/initscripts.git>, and bisect the problem?

Thanks for your reporting so far!
Comment by tiny (tiny) - Monday, 09 May 2011, 14:18 GMT
I would bisect but I have no clue what to pick for a starting/working commit
since this is a new box with fresh install.
I will test try when I manage to squeeze in some time.

Regards.
Comment by tiny (tiny) - Monday, 09 May 2011, 14:28 GMT
Hmm ... if I decide to bisect I obviously need to clone initscripts but
then I also need to decide what initscripts are essential to related problem
and copy them over to systems /etc directory and subdirectories.

How do I decide what scripts to copy over. Even if I copy them over I still need
to edit some of them, like rc.conf and maybe even some more. Sounds pita :)
Comment by Tom Gundersen (tomegun) - Monday, 09 May 2011, 15:13 GMT
Working commit: the git tags correspond to the releases, so if you know that it worked with 2011.02.1, then choose that as a working commit.

The files to copy: do not copy anything that you might have edited (rc.conf rc.local rc.local.shutdown). I guess what you want to copy is:
to /etc
rc.sysinit, rc.multi
to /etc/rc.d
network, functions

I'd suggest making a little batch script to copy the right files so you don't have to do it manually :-)

Thanks for looking into it!
Comment by Tom Gundersen (tomegun) - Wednesday, 11 May 2011, 13:42 GMT
@tiny: i missed that you said that the box is new. In case this is not a regression, I would appreciate if you could just check the previous initscripts release to check if it happens there too.

Assuming this is not a regression:

From the original bug report it looks like the problem is that

> May 5 14:22:14 websrv kernel: [ 42.508520] NET: Registered protocol family 10

happens after dhcpcd times out. It looks like udev settles before the network device is fully initialized. I think there was a recent udev commit that should fix the settling finishing too early, so please test again when the new udev is in testing.
Comment by Tom Gundersen (tomegun) - Friday, 13 May 2011, 23:07 GMT
@tiny: i discussed this a bit with falconindy and we have some new suggestions:

Try adding your network module to the MODULES array in rc.conf (this will force it to be loaded early in boot).

If that does not work: try removing your sysctl.conf to see if that helps (apparently sysctl might reload some modules if you are really unlucky).

Any luck with the bisection?
Comment by Tom Gundersen (tomegun) - Sunday, 15 May 2011, 14:57 GMT
I downgraded the bug report from Critical to High as we don't know that it was a regression. I also added falconindy to the report in case he has any further suggestions.
Comment by Tom Gundersen (tomegun) - Tuesday, 17 May 2011, 18:48 GMT
Please reconfirm this bug with udev-168-2. It was probably fixed there.

Loading...