FS#36749 - [archiso] dhcpcd fails to start when using PXE boot on the archiso media.

Attached to Project: Release Engineering
Opened by Alejandro Liu (alejandro_liu) - Saturday, 31 August 2013, 12:31 GMT
Last edited by David Runge (dvzrv) - Monday, 29 March 2021, 17:31 GMT
Task Type Bug Report
Category ArchISO
Status Closed
Assigned To David Runge (dvzrv)
Architecture All
Severity Medium
Priority Normal
Reported Version
Due in Version Undecided
Due Date Undecided
Percent Complete 100%
Votes 1
Private No

Details

In archiso-2013.08.01 the dhcpcd fails to start properly. The previous version that I tested was 2013.05.01 and that worked properly.

With the new "Predictable Interface Names", network interface names are changed by udev which do not work in archiso.

In archiso, when you are booting using PXE, the network interface is initialized by "archiso_pxe_common" hook.

When systemd-udev tries to change the network interface name to the "Predictable interface name" it fails.

Later, there is a udev rule in "/etc/udev/rules.d/81-dhcpcd.rules" which enables dhcpcd on the "predictable interface name" but this fails as this name is not there.

This task depends upon

Closed by  David Runge (dvzrv)
Monday, 29 March 2021, 17:31 GMT
Reason for closing:  Fixed
Additional comments about closing:  Fixed with https://gitlab.archlinux.org/archlinux/a rchiso/-/merge_requests/106
Comment by Dave Reisner (falconindy) - Saturday, 31 August 2013, 12:51 GMT
So if you add net.ifnames=0 to the kernel commandline, dhcpcd starts?
Comment by Alejandro Liu (alejandro_liu) - Sunday, 01 September 2013, 06:21 GMT
Yes, as it disables the "Predictable Interface Names" feature, then dhcpcd starts properly then.

udev is unable to rename ethX to these names due to the fact that the interface is up and running (having been configured by the pxe hook). So when it tries it says that the device is busy.
Comment by Dave Reisner (falconindy) - Sunday, 01 September 2013, 12:14 GMT
> Later, there is a udev rule in "/etc/udev/rules.d/81-dhcpcd.rules" which enables dhcpcd on the "predictable interface name" but this fails as this name is not there.
You're wrong. The rule creates dhcpcd instances on ADD events for network interfaces. It does *not* explicitly cater to the renamed interface names.

> udev is unable to rename ethX to these names due to the fact that the interface is up and running
Right, and the fact that it can't rename anything is fine.

You seem to be stating the following:
1) udev tries to rename the the interface and it can't. Therefore, the interface remains eth0, but dhcpcd fails to start on eth0.
2) udev doesn't try to rename the interface. Therefore, the interface remains eth0 and dhcpcd starts on eth0.

...which is quite contradictory. There's always going to be an ADD event for eth0 which pulls in a dhcpcd instance for eth0, and your bug report doesn't make it clear why in one case it fails, and the other case it succeeds.
Comment by Alejandro Liu (alejandro_liu) - Sunday, 01 September 2013, 20:24 GMT
If I use "net.ifnames=0" then udev doesn't attempt to rename the device and "dhcpcd" starts just fine on eth0.

If I do NOT use "net.ifnames=0", then udev first try to rename the interface to the "predictable name" (and fails), and then "dhcpcd" tries to start on the "predictable name" (apparently does not know that the interface rename fails) and it also fails.

I admit, I am not too familiar with udev and launching "ADD" events, but if I boot without "net.ifnames=0", then "systemctl | grep dhcpcd" shows that dhcpcd is configured on enp0s3 (the predictable name)
If I boot with "net.ifnames=0", then "systemctl | grep dhcpcd" shows that dhcpcd is configured on eth0.
My assumption here is that the "/etc/udev/rules.d/81-dhcpcd.rules" is changing the dhcpcd configuration.

Either way, the "net.ifnames=0" workaround works, so I am happy.

Comment by Dave Reisner (falconindy) - Sunday, 01 September 2013, 23:02 GMT
Seems like it's working as intended... interfaces are renamed in the initramfs.

You've yet to post any errors from dhcpcd (or anywhere else) -- care to share?
Comment by Alejandro Liu (alejandro_liu) - Monday, 02 September 2013, 08:27 GMT
Will download the latest iso image and try this again.
Comment by Alejandro Liu (alejandro_liu) - Monday, 02 September 2013, 14:40 GMT
I am enclosing a copy of the output of "journalctl" of this.

So what is going on is this:

1. archiso_pxe_common configures the network as "eth0" in the initramfs.
2. systemd and udev get started.
3. udevd tries to rename "eth0" to "enp3s0" and fails.
4. systemd starts dhcpcd@enp3s0 and also fails.




Comment by Alejandro Liu (alejandro_liu) - Monday, 02 September 2013, 14:46 GMT
For your reference I am also including the output of "journalctl" when the "net.ifnames=0" is used.

In this situation:

1. archiso_pxe_common configures the network as "eth0" in the initramfs.
2. systemd and udev get started.
3. net.ifnames=0 prevets udev from trying to rename "eth0' to "enp3s0".
4. systemd starts dhcpcd@eth0 and succeeds.

Comment by Gerardo Exequiel Pozzi (djgera) - Monday, 02 September 2013, 15:16 GMT
Thanks for remember this, I know about this issue. The workaround should be documented in some place. But I also thinking about setting down eth link from initramfs just after all things done, in this way no workaround is needed. what do you think?
Comment by Alejandro Liu (alejandro_liu) - Tuesday, 03 September 2013, 05:51 GMT
Setting down eth would be possible.

There is a copytoram=n option that would prevent that from working but for people who are using that option would have to use the net.ifnames=0 workaround.

Comment by Gerardo Exequiel Pozzi (djgera) - Tuesday, 03 September 2013, 15:18 GMT
copytoram=n is not recommended when using PXE with NFS or NBD, should be used only if you know what are you doing when you setting dhcp server options.
Comment by Gerardo Exequiel Pozzi (djgera) - Friday, 27 September 2013, 22:59 GMT Comment by Ian Kelling (IanKelling) - Monday, 02 November 2015, 02:34 GMT
This bug still exists. Some relevant journalctl lines below, which are also in the full journalctl posted earlier.

Tested on 2 x64 machines, 1 desktop, 1 is a lenovo thinkpad x200.
Tested 2 isos:
archlinux-2015.11.01-dual.iso
archlinux-2015.10.01-dual.iso


Nov 01 16:36:30 archiso systemd-udevd[345]: Error changing net interface name 'eth0' to 'enp6s0': Device or resource busy
Nov 01 16:36:30 archiso systemd-udevd[345]: could not rename interface '2' from 'eth0' to 'enp6s0': Device or resource busy

...

Nov 01 16:36:31 archiso dhcpcd[393]: enp6s0: interface not found or invalid
Nov 01 16:36:31 archiso dhcpcd[393]: dhcpcd exited


I updated the pxe wiki article so the workaround is documented.

If there is something that needs testing, I'd be happy to do it. One idea is to put net.ifnames=0 in the network boot grub menu items.
Comment by Gerardo Exequiel Pozzi (djgera) - Saturday, 19 March 2016, 01:08 GMT
Yes, please test this other patch, seems the best way https://lists.archlinux.org/pipermail/arch-releng/2016-March/003667.html

Thanks.
Comment by Jamin Collins (jamincollins) - Wednesday, 15 March 2017, 04:29 GMT
From what I can see, this is *still* an issue with the 2017-03-01 release. At least when PXE booting.
Comment by Gerardo Exequiel Pozzi (djgera) - Wednesday, 15 March 2017, 04:30 GMT
Please attach info.
Comment by Jamin Collins (jamincollins) - Wednesday, 15 March 2017, 18:59 GMT
# uname -a
Linux archiso 4.9.11-1-ARCH #1 SMP PREEMPT Sun Feb 19 13:45:52 UTC 2017 x86_64 GNU/Linux


Without net.ifnames at all:

# systemctl status dhcpcd@enp0s25.service
dhcpcd@enp0s25.service - dhcpcd on enp0s25
Loaded: loaded (/usr/lib/systemd/system/dhcpcd@.service; disabled; vendor preset: disabled)
Active: failed (Result: exit-code) since Wed 2017-03-15 12:37:32 UTC; 6min ago
Process: 390 ExecStart=/usr/bin/dhcpcd -q -w %I (code=exited, status=1/FAILURE)

Mar 15 12:37:32 archiso systemd[1]: Starting dhcpcd on enp0s25...
Mar 15 12:37:32 archiso systemd[1]: dhcpcd@enp0s25.service: Control process exited, code=exited status=1
Mar 15 12:37:32 archiso systemd[1]: Failed to start dhcpcd on enp0s25.
Mar 15 12:37:32 archiso systemd[1]: dhcpcd@enp0s25.service: Unit entered failed state.
Mar 15 12:37:32 archiso systemd[1]: dhcpcd@enp0s25.service: Failed with result 'exit-code'.

# cat /proc/cmdline
BOOT_IMAGE=(tftp)/archiso/arch/boot/x86_64/vmlinuz archisobasedir=/arch archiso_http_srv=192.168.10.22 ip=:::::eth0:dhcp

# cat /etc/resolv.conf
#
# /etc/resolv.conf
#

#search <yourdomain.tld>
#nameserver <ip>

# End of file


With net.ifnames=1

# systemctl status dhcpcd@enp0s25.service
dhcpcd@enp0s25.service - dhcpcd on enp0s25
Loaded: loaded (/usr/lib/systemd/system/dhcpcd@.service; disabled; vendor preset: disabled)
Active: failed (Result: exit-code) since Wed 2017-03-15 12:46:35 UTC; 32s ago
Process: 380 ExecStart=/usr/bin/dhcpcd -q -w %I (code=exited, status=1/FAILURE)

Mar 15 12:46:35 archiso systemd[1]: Starting dhcpcd on enp0s25...
Mar 15 12:46:35 archiso systemd[1]: dhcpcd@enp0s25.service: Control process exited, code=exited status=1
Mar 15 12:46:35 archiso systemd[1]: Failed to start dhcpcd on enp0s25.
Mar 15 12:46:35 archiso systemd[1]: dhcpcd@enp0s25.service: Unit entered failed state.
Mar 15 12:46:35 archiso systemd[1]: dhcpcd@enp0s25.service: Failed with result 'exit-code'.

# cat /proc/cmdline
BOOT_IMAGE=(tftp)/archiso/arch/boot/x86_64/vmlinuz archisobasedir=/arch archiso_http_srv=192.168.10.22 ip=:::::eth0:dhcp net.ifnames=1

# cat /etc/resolv.conf
#
# /etc/resolv.conf
#

#search <yourdomain.tld>
#nameserver <ip>

# End of file


With net.ifnames=0

# systemctl status dhcpcd@eth0.service
dhcpcd@eth0.service - dhcpcd on eth0
Loaded: loaded (/usr/lib/systemd/system/dhcpcd@.service; disabled; vendor preset: disabled)
Active: active (running) since Wed 2017-03-15 12:53:53 UTC; 35s ago
Process: 388 ExecStart=/usr/bin/dhcpcd -q -w %I (code=exited, status=0/SUCCESS)
Main PID: 463 (dhcpcd)
Tasks: 1 (limit: 4915)
CGroup: /system.slice/system-dhcpcd.slice/dhcpcd@eth0.service
└─463 /usr/bin/dhcpcd -q -w eth0

Mar 15 12:53:53 archiso dhcpcd[388]: DUID 00:01:00:01:20:5b:f5:e1:3c:97:0e:0f:c9:5a
Mar 15 12:53:53 archiso dhcpcd[388]: eth0: IAID 0e:0f:c9:5a
Mar 15 12:53:53 archiso dhcpcd[388]: eth0: soliciting a DHCP lease
Mar 15 12:53:53 archiso dhcpcd[388]: eth0: offered 192.168.10.101 from 192.168.10.22
Mar 15 12:53:53 archiso dhcpcd[388]: eth0: leased 192.168.10.101 for 14400 seconds
Mar 15 12:53:53 archiso dhcpcd[388]: eth0: adding route to 192.168.10.0/24
Mar 15 12:53:53 archiso dhcpcd[388]: eth0: adding default route via 192.168.10.1
Mar 15 12:53:53 archiso systemd[1]: Started dhcpcd on eth0.
Mar 15 12:53:53 archiso dhcpcd[463]: eth0: soliciting an IPv6 router
Mar 15 12:54:05 archiso dhcpcd[463]: eth0: no IPv6 Routers available

# cat /proc/cmdline
BOOT_IMAGE=(tftp)/archiso/arch/boot/x86_64/vmlinuz archisobasedir=/arch archiso_http_srv=192.168.10.22 ip=:::::eth0:dhcp net.ifnames=0

# cat /etc/resolv.conf
# Generated by resolvconf
domain asgardsrealm.net
nameserver 192.168.10.22
nameserver 192.168.10.2


Comment by Jamin Collins (jamincollins) - Wednesday, 15 March 2017, 19:27 GMT
And similar to the previous report (with net.ifnames=1):

...
Mar 15 13:22:07 archiso systemd-udevd[302]: Error changing net interface name 'eth0' to 'enp0s25': Device or resource busy
Mar 15 13:22:07 archiso systemd-udevd[302]: could not rename interface '2' from 'eth0' to 'enp0s25': Device or resource busy
...
Comment by Gerardo Exequiel Pozzi (djgera) - Thursday, 16 March 2017, 15:14 GMT
OK, I will apply the patch and disable network rename at all.
Comment by Gerardo Exequiel Pozzi (djgera) - Friday, 07 April 2017, 14:11 GMT
@jamincollins: do you made any test with the patch?
Comment by Jamin Collins (jamincollins) - Friday, 07 April 2017, 21:20 GMT
The patch as referenced does not apply as the structure of the referenced file has changed.

Near as I can tell the intent is something like this:
$ diff -u /usr/lib/initcpio/hooks/archiso_pxe_common.orig /usr/lib/initcpio/hooks/archiso_pxe_common
--- /usr/lib/initcpio/hooks/archiso_pxe_common.orig 2017-04-07 13:51:43.999839112 -0700
+++ /usr/lib/initcpio/hooks/archiso_pxe_common 2017-04-07 13:52:09.536574991 -0700
@@ -68,5 +68,7 @@
elif [[ "${copy_resolvconf}" != "n" && -f /etc/resolv.conf ]]; then
cp /etc/resolv.conf /new_root/etc/resolv.conf
fi
+ ln -s /dev/null /new_root/etc/udev/rules.d/80-net-name-slot.rules
+ rm /new_root/etc/udev/rules.d/81-dhcpcd.rules
fi
}

While this change does appear to prevent the errors around attempting (and failing) to rename the interface. It does *not* result in a functional system.

root@archiso ~ # systemctl status dhcpcd@eth0.service
dhcpcd@eth0.service - dhcpcd on eth0
Loaded: loaded (/usr/lib/systemd/system/dhcpcd@.service; disabled; vendor pre
Active: inactive (dead)

root@archiso ~ # systemctl list-units | grep dhcp | wc -l :(
0

root@archiso ~ # cat /etc/resolv.conf
#
# /etc/resolv.conf
#

#search <yourdomain.tld>
#nameserver <ip>

# End of file
Comment by Francois Dupoux (fdupoux) - Sunday, 24 May 2020, 17:17 GMT
I had a similar issue where the DHCP client in the final root file system was never executed because the two commands used to de-configure the network interface after the PXE boot code were not executed in the late hook. These two commands are only run when the BOOTIF boot parameter is set, typically when using SYSAPPEND with recent versions of pxelinux.

if [[ -n "${bootif_dev}" ]]; then
ip addr flush dev "${bootif_dev}"
ip link set "${bootif_dev}" down
fi

I have created a patch so it resets the IP configuration on all network interfaces, so it does not need BOOTIF to be set, and so the DHCP client in the final root file system can run again.

You can find the patch and all the details there:
- https://gitlab.com/fdupoux/sysresccd-src/-/tree/master/patches
- https://gitlab.com/fdupoux/sysresccd-src/-/issues/19

Could you please consider applying this patch (and ideally my other unrelated patch) to the archiso sources.
Comment by David Runge (dvzrv) - Wednesday, 10 June 2020, 07:45 GMT
@fdupoux is this still relevant for booting with archiso v44 (e.g. image >= 2020.06)? I have removed dhcpcd in favor of systemd-networkd
Comment by Francois Dupoux (fdupoux) - Wednesday, 10 June 2020, 07:53 GMT
I guess it is still relevant. The issue we have is that the network interface is not deconfigured at the end the pxe boot process and this prevents the DHCP client from running in the final environment (after the switch to the new root). So changing the dhcp client should make no difference, and actually it is the ipconfig program (not dhcpcd) which archiso_pxe_common uses to configure the network interface via DHCP.
Comment by David Runge (dvzrv) - Wednesday, 10 June 2020, 08:08 GMT
@fdupoux: Thanks for the clarification!

I will look into these fixes for sure! Thanks for providing them (I already saw you offered pull-requests for this on github [1]).
We'll switch to our gitlab soonish though (outside collaboration is still not yet possible due to technical reasons) and github will really only be a readonly mirror.

That being said: The fixes seem like something we will want to include!

[1] https://github.com/archlinux/archiso/pulls
Comment by Francois Dupoux (fdupoux) - Wednesday, 10 June 2020, 08:10 GMT
Thanks David
Comment by David Runge (dvzrv) - Wednesday, 18 November 2020, 09:40 GMT
@fdupoux: Would you mind opening a merge request to fix this? As you have already applied this for sysresccd it should be fairly trivial to add this to archiso now! :)

Loading...