FS#3369 - ldap group management breaks udev on start

Attached to Project: Arch Linux
Opened by Alex Matviychuk (alexmat) - Friday, 21 October 2005, 20:33 GMT
Last edited by arjan timmerman (blaasvis) - Wednesday, 02 November 2005, 10:13 GMT
Task Type Bug Report
Category System
Status Closed
Assigned To Judd Vinet (judd)
Architecture not specified
Severity High
Priority Normal
Reported Version 0.7 Wombat
Due in Version Undecided
Due Date Undecided
Percent Complete 100%
Votes 0
Private No

Details

I've been using udev and ldap user and group authentication for a while now, but the latest udev update makes my machine stuck on [busy] when loading udev. If I set nsswitch.conf => "group files" instead of "group files ldap", everything works fine. "group files ldap" used to work fine. I checked and ldap groups are working fine if I enable it after bootup. The thing is the ldap server is not booted up before udev.. but this was never necessary before.
This task depends upon

Closed by  Roman Kyrylych (Romashka)
Wednesday, 03 January 2007, 21:15 GMT
Reason for closing:  None
Comment by Alex Matviychuk (alexmat) - Friday, 21 October 2005, 20:35 GMT
Also, I built the latest udev (071) using the udev PKGBUILD from abs and it doesn't seem to help. So it appears to be a config thing.
Comment by Alex Matviychuk (alexmat) - Thursday, 15 December 2005, 21:00 GMT
After a ton of headaches and hours upon hours of googling I think I found the problem. I was using a workaround to rewrite the nsswitch.conf file on every reboot before udev came up, but for some reason the new kernel with initrd wouldn't allow the workstations to mount the root partition rw before udev came up even though I put the rw swtich in menu.lst.

Ok that's all history now because I got sick of finding workarounds for workarounds. I dug into udev and found what causes it to halt. It is indeed trying to resolve with an ldap server thats active, but on the network, and since the network services don't start without udev it becomes a cyclical dependancy.

However, udev worked just fine a few updates ago so what happened? udev.rules assignes devices to groups using numbers... that is until recently, now half the rules are numbers and half are names. I switched all group names to their nuemerical mappings and Viola! everything is smooth again.

I don't know how to resolve this in a clean manner for UDEV and LDAP. Putting in numbers instead of group names is a chore and not all systems may use the same mappings (although I would think most people stick with the default group mappings). However I can't imagine how NSS_LDAP can work with the current UDEV, because the system insists on timing out waiting for a LDAP server it's never going to reach.

I did a man on nsswitch.conf and there were some interesting bits in there about switches like TRYAGAIN and UNAVAIL, however, I could not get any of them to make UDEV skip the LDAP entry in the nsswitch.conf on boot.

There must be a nice way to do this that I am overlooking. Help me Obi Judd Kenobi! You're my only hope ;P
Comment by eS.eF. (Dimorph) - Sunday, 18 December 2005, 16:29 GMT
I have the same problem in my network with around 25 Clients.
A workaround solved the matter...

--- start_udev 2005-12-18 17:21:41.000000000 +0100
+++ start_udev.new 2005-12-18 17:21:09.000000000 +0100
@@ -92,8 +92,8 @@
# You can use the shell scripts above by calling run_udev or execute udevstart
# which does the same thing, but much faster by not using shell.
# only comment out one of the following lines.
-#run_udev
-/sbin/udevstart
+run_udev
+#/sbin/udevstart

echo "making extra nodes"
make_extra_nodes
Comment by Alex Matviychuk (alexmat) - Monday, 19 December 2005, 00:15 GMT
Interesting, I'll give this a shot.

Thanks for sharing :)
Comment by Tobias Powalowski (tpowa) - Tuesday, 20 December 2005, 06:58 GMT
is this still in 078? or only an issue of 068?
Comment by Alex Matviychuk (alexmat) - Tuesday, 20 December 2005, 07:39 GMT
It is still in 078 as far as I can tell, more specifically it is in the udev.rules file. The groups can't resolve if you have ldap set in nsswitch.conf
Comment by Tobias Powalowski (tpowa) - Wednesday, 21 December 2005, 07:53 GMT
i don't understand what do you mean with numbers, all rules in udev.rules contain GROUP="group" not any number.

EDIT:
tell us which version of udev did work?
Comment by Alex Matviychuk (alexmat) - Wednesday, 21 December 2005, 09:22 GMT
UDev is not the problem, it is udev.rules (which looking through the CVS, has been changing like crazy). And you're right, the udev.rules by default has never used numbers. It simply didn't use the group="$VALUE" until revision 1.8 of udev.rules. Here is where things began to break:

http://cvs.archlinux.org/cgi-bin/viewcvs.cgi/base/udev/udev.rules.diff?r1=1.7&r2=1.8&cvsroot=Current&only_with_tag=MAIN

Version 1.7 of udev.rules was the last one to work because the things in 1.8 contains group="$VALUE". When the system boots it tries resolve $VALUE to a number. I have my nsswitch.conf set to: group files ldap, so it tries to resolve to an ldap server and times out (hangs indefinitly). I changed all the group names to group numbers and udev will now start up fine. But it is broken with every udev update. My options are to disable ldap (not really an option for me at this point) or mess with udev all the time (although I still haven't tried the suggestion from dimorph, maybe that will help things).

Just to be clear, version 1.7 of udev.rules works fine with ldap, 1.8 does not.
Comment by Alex Matviychuk (alexmat) - Wednesday, 21 December 2005, 09:24 GMT
Here is the link to a full version of udev.rules v1.7: http://cvs.archlinux.org/cgi-bin/viewcvs.cgi/base/udev/udev.rules?rev=1.7&cvsroot=Current&only_with_tag=MAIN&content-type=text/vnd.viewcvs-markup

You'll notice there are no instances of group="$VALUE".
Comment by Tobias Powalowski (tpowa) - Wednesday, 21 December 2005, 10:35 GMT
well group= is needed because udev uses this for permissions,
all big distros use group="name" so i think that's the standard way.
i don't know how the others deal with ldap, but their start_udev doesn't differ from ours.
Comment by Alex Matviychuk (alexmat) - Wednesday, 21 December 2005, 17:35 GMT
I did a little digging on google and the best I could find was this post on the redhat mailing list where a user is having the exact same issue:
https://www.redhat.com/archives/fedora-devel-list/2005-September/msg00406.html

Unfortunaley, that thread doesn't resolve anything.

I know I'm not the only one with the problem and it may not be an Arch Linux specific thing, but it is a problem as far as I can tell. Are any of the devs using a nss_ldap setup with a recent udev?
Comment by Alex Matviychuk (alexmat) - Wednesday, 21 December 2005, 17:38 GMT
Also just as a reference, here is a recent arch linux thread with several others reporting the same issue: http://bbs.archlinux.org/viewtopic.php?t=16519
Comment by Judd Vinet (judd) - Monday, 26 December 2005, 18:00 GMT
Hi Alex,

Did the workaround from Dimorph work at all? If so, we can use that until a fix comes from upstream.
Comment by David Rosenstrauch (darose) - Thursday, 16 March 2006, 20:09 GMT
I'm having the same problem - with udev 087-1. Has anyone come up with a workaround for this yet? It's a horrible bug!
Comment by David Rosenstrauch (darose) - Thursday, 16 March 2006, 20:31 GMT
Never mind. One of the suggestions here worked.

I set "bind_policy soft" in /etc/nss_ldap.conf and now all is well. I get messages that it's failing a few ldap lookups early in the boot (before the modules are loaded and the network started up) but it just continues on its merry way after that, and the ldap kicks in fine once the network comes up.

Thanks for the suggestions, all.

Loading...