FS#13299 - [initscripts] /etc/rc.d/network WIRELESS_TIMEOUT check doesn't work all the time

Attached to Project: Arch Linux
Opened by Aaron Griffin (phrakture) - Monday, 16 February 2009, 04:10 GMT
Last edited by James Rayner (iphitus) - Tuesday, 02 March 2010, 10:07 GMT
Task Type Bug Report
Category Packages: Core
Status Closed
Assigned To Aaron Griffin (phrakture)
James Rayner (iphitus)
Thomas Bächler (brain0)
Architecture All
Severity Low
Priority Normal
Reported Version None
Due in Version Undecided
Due Date Undecided
Percent Complete 100%
Votes 9
Private No

Details

The current wireless implementation in /etc/rc.d/network uses $(iwgetid $INTERFACE -ar) to check if it is associated with an AP before continuing. Apparently this doesn't work all the time. I know Thomas reported it being quasi-functional for him, and for me my wireless appears to NEVER associate with an AP until dhcp is run.

I'm not sure of a proper fix for this, but I would recommend removing this check completely, and just relying on WIRELESS_TIMEOUT for the sleep. I have commented the AP check out on my machine and it works great.
This task depends upon

Closed by  James Rayner (iphitus)
Tuesday, 02 March 2010, 10:07 GMT
Reason for closing:  Implemented
Comment by James Rayner (iphitus) - Monday, 16 February 2009, 09:51 GMT
Not associating until dhcp just seems completely broken to me. For exmaple, I use static IP over wireless, which breaks this assumption that dhcp will be used.

There's two downsides to disabling the check:
- It means that association may go longer than necessary
- People will receive DHCP fail messages when they're really having association problems, due to incorrect key, essid, range, etc.

I'll make some changes to the present wireless code, making the check "opt out".

I'd like to try and move to the wpa_supplicant/dbus based wireless as soon as I can, it _shouldn't_ have these sorts of problems, though I really have no idea. The code's written, its working, I just need to package it and put it in testing. I'll do that now.

What driver? I'm guessing this is your Thinkpad?
Comment by James Rayner (iphitus) - Monday, 16 February 2009, 09:53 GMT
Oops, you mean rc.d/network not netcfg. This might explain some odd bugs I've had for netcfg, would you be able test and check if it's also affected?

I still think this is odd behaviour, so maybe an option to opt-out, or opt-in of the check?
Comment by Aaron Griffin (phrakture) - Monday, 16 February 2009, 16:53 GMT
@james: I thought you wrote this section in rc.d/network, which is why I added you.

If I do the following:
ifconfig wlan0 up
iwconfig wlan0 essid foobar
...wait for some time...
iwgetid wlan0 -ar

I get the invalid MAC address... always. I never get an AP listed. I ran it in a loop sleeping 1 second in-between and never got a valid AP. However, dhcpcd works fine.

While I agree that this is broken, the check is also broken if it requires an AP before running dhcpcd. The DHCP request will fail on its own if there is no AP, so I wonder why this check is even needed. Perhaps if the next step (dhcp or whatever) fails, THEN we could check to see if there's an AP so we can output an informational message.

For the record - this has actually never worked for me - using either ndiswrapper, madwifi, or ath5k (current)
Comment by James Rayner (iphitus) - Monday, 16 February 2009, 22:46 GMT
Ok. Thats a good idea, move it after dhcpcd if dhcp fails.

Just checking, it does show an AP after dhcp has worked?
Comment by Aaron Griffin (phrakture) - Monday, 16 February 2009, 22:53 GMT
Yeah, it's valid after I have a connection - this could be related to my router (Tomato firmware)
Comment by James Rayner (iphitus) - Tuesday, 17 February 2009, 01:23 GMT
I'll make a patch for rc.d/network to do what you suggested above. I have tomato here so I'll try to reproduce it.

Could you please give netcfg in [core] a test and see if it is also affected? /etc/network.d/examples/wep.example is the appropriate example config. If it also fails, could you comment out the line that includes "wep_check" in /usr/lib/network/wireless.subr and replace it with sleep 20?

Thanks!
Comment by James Rayner (iphitus) - Tuesday, 17 February 2009, 02:29 GMT
ok, maybe I might have this worked out. maybe.

I have tomato firmware, set wireless to WEP, 64bit. I'm borrowing my girlfriend's eeePC which has ath5k, ubuntu, 2.6.27, dhclient

Case 1 - fail
rmmod ath5k; modprobe ath5k;
iwconfig wlan0 essid rayner key 017CF3C0EE
iwgetid wlan0 -ra reports 00:00:00:00:00:00.
dhclient brings the interface up, allowing it to associate and then get an IP.

Case 2 - works
rmmod ath5k; modprobe ath5k;
ifconfig wlan0 up (or ip link set wlan0 up)
iwconfig wlan0 essid rayner key 017CF3C0EE
iwgetid wlan0 -ra reports the AP mac correctly.
The interface is already up, allowing ath5k to associate before dhcp.

In /etc/rc.d/network, It does not do ifconfig wlan0 up/ip link set wlan0 up before attempting to set wireless settings.

This does not affect my ieee80211 stack based ipw2100 as ifconfig up/down does not enable/disable the driver/hardware in any way.
Comment by John (CapnJB) - Monday, 09 March 2009, 17:35 GMT
Hey guys, bug #11273 is a duplicate of this. Also, I posted a solution that works for me in the comments section of that bug report. Unfortunately I have very little hardware on which I can test this, but let me know if using "ifconfig wlan0 up" after running "iwconfig wlan0 ..." works for any of you.
Comment by Aaron Griffin (phrakture) - Monday, 09 March 2009, 19:12 GMT
I still think it might be a good idea to only check for a valid AP after DHCP or ifconfig fails.
Comment by James Rayner (iphitus) - Tuesday, 10 March 2009, 07:56 GMT
I'll disagree, It's a wierd and illogical check. Using dhcp without associating is like trying to use plain dial up without a phone line.

The ifconfig up as suggested _should_ fix any issue. The check was failing because the device had not been brought up, and thus had not associated. Without the check, DHCP was bringing the interface up and allowing it to associate.

Comment by Aaron Griffin (phrakture) - Tuesday, 10 March 2009, 16:07 GMT
I thought I tried that and it still failed for me, which is why I suggested this route. I will check this again and report back (but can't do it remotely)
Comment by Hatem Nassrat (pykler) - Thursday, 19 March 2009, 12:42 GMT
I Also do not recomend commenting out the check, but it is definitely in the wrong place. A lot of poor people on the tubes have been commenting that check out. So please review the attached patch so that we can help them out.

--
Hatem
Comment by James Rayner (iphitus) - Thursday, 19 March 2009, 13:13 GMT
I don't want to make any changes until I hear back from Aaron, or the "lot of poor people on the tubes" comment and tell me the fix doesn't work. I'd rather a simple and logical fix, than one that just seems odd and hacky.

In my testing on similar hardware adding "ifconfig up" fixes it. It makes sense too.
Comment by Aaron Griffin (phrakture) - Thursday, 19 March 2009, 20:42 GMT
Hmmm I added an "ifconfig $foo up" line, and I think i've restarted twice since then and haven't had an issue. My guess is it works, but it's a weak verification.

This saturday is "bug day", so ping me then and I will do more vigorous testing if you want it
Comment by Hatem Nassrat (pykler) - Thursday, 19 March 2009, 21:47 GMT
The bug and the patch above are really simple. Im not sure why it is "hacky".

The original bug was due to the wireless check, trying to see if an access point has associated with the interface. And from my understanding this only happens when settling with IP addresses (I believe that the protocol, but I need to double check).

Now what in the patch I did is split up the check such that it was done after the IF UP line. Thus only checking if the association occurred after settling on the ip address :)
Comment by Aaron Griffin (phrakture) - Thursday, 19 March 2009, 21:57 GMT
I don't think James' "hacky" comment was about your patch.

I kinda like the patch - James, what do you think?
Comment by vincent (altrus) - Tuesday, 24 March 2009, 06:15 GMT
I'd like to group myself with 'all the poor people in the tubes'.

This patch resolved a really annoying bug for me. Provided it meets with you approval, I would really love to see this merged into [core].
Comment by James Rayner (iphitus) - Tuesday, 24 March 2009, 08:52 GMT
Adding "ifconfig interface up" appears to be a solution to the bug. From my test, and what's been posted here, it appears the interface has not been brought up, and the driver requires this for association. It worked in my limited test, and nobody has suggested otherwise yet.

The proposed patch is a "workaround". Using dhcpcd/ifconfig to bring the interface up later on when it should be explicitly done beforehand, and then moving the check out of logical sequence.

That's what I think. *shrug*
Comment by Thomas Bächler (brain0) - Tuesday, 24 March 2009, 09:52 GMT
First bringing the device up is necessary for association, thus this is not a workaround, but the only proper implementation!
Comment by Marcello Maggioni (Kariddi) - Tuesday, 24 March 2009, 12:06 GMT
I had this problem too on my ath9k card (Asus EEE PC 1000HE)

Solved by adding "/sbin/ifconfig $wlan_${1} up" between the iwconfig and the sleep command
Comment by Marcello Maggioni (Kariddi) - Tuesday, 24 March 2009, 12:37 GMT
I had this problem too on my ath9k card (Asus EEE PC 1000HE)

Solved by adding "/sbin/ifconfig $wlan_${1} up" between the iwconfig and the sleep command
Comment by Hatem Nassrat (pykler) - Tuesday, 24 March 2009, 16:39 GMT
@James, where did you add this ifaceup, in the rc.conf ???

The patch (I am sure you noticed) is for the init script, which is used to bring up the interface. As Thomas mentioned, this is the "only proper implementation".

To associate with an accesspoint you need to use the iface. if the iface is down the check will most certainly fail. If the iface is up, then the check will have a purpose.
Comment by Hatem Nassrat (pykler) - Tuesday, 24 March 2009, 16:52 GMT
@James, also if I undrestand correctly, what you are doing, is exactly the same as what the patch is doing, except that you have two calls to ifconfig up :), which isn't ideal at best.
Comment by Aaron Griffin (phrakture) - Tuesday, 24 March 2009, 18:04 GMT
@James: To be clear, the patch does the same thing, just slightly differently than you're expecting it

iwconfig will still work fine if the interface is not up. So rather than calling ifconfig, iwconfig, and ifconfig again, this patch maintains the exact same order (iwconfig, ifconfig). It just moves the check until after ifconfig is called to bring up the device.

My only qualm is that now the check is being called for all interfaces, without a check to see if it is a wireless interface first.
Comment by Hatem Nassrat (pykler) - Tuesday, 24 March 2009, 21:28 GMT
@Aaron, I had that qualm as well, I was going to do something about it when I realized that this is what is happening before the patch (i.e now). So with or without the patch, both iwconfig and the check is called. It seems to work fine though for other ifaces, which makes me think that iwconfig & iwgetid just ignores the other ifaces (not sure).
Comment by James Rayner (iphitus) - Tuesday, 24 March 2009, 23:09 GMT
Just like Thomas said, adding an ifconfig interface up between the iwconfig and sleep, is correct implementation. I'm not sure why you like this patch Aaron, I can see what it does, but it's clearly a workaround, not the proper solution.

Presently, iwconfig is run, there's a 2 second sleep (or more) to allow for association, and then ifconfig/dhcp runs. On cards affected by this bug that 2 second sleep does nothing, as the device has not been brought up.

Hatem: Presently iwconfig is only called if someone configures a wlan_ line. Whether you think two calls the ifconfig is ideal or not, it is the proper implementation.
Comment by Hatem Nassrat (pykler) - Tuesday, 24 March 2009, 23:29 GMT
@James, please reread "Thomas Bächler (brain0)" comment and please re-read the Patch, I think you have misread both.
Comment by Aaron Griffin (phrakture) - Wednesday, 25 March 2009, 14:54 GMT
Yeesh, so much hassle over a little change. I like the patch because:
a) It breaks out the check, which is nice.
b) Does the check AFTER dhcp/ifconfig/whatever is run, not before

See, if this check was done AFTER dhcp, this problem never ever would have appeared. As I said, commenting the check out works completely fine for me. My interface works perfectly, but the check was aborting early. Not only does it address the problem, but it also addresses the REASON the problem existed in the first place.

If you're going to add an "ifconfig up" in there, that's fine. BUT, please at least move the check much later, as this patch does.
Comment by Ray (ataraxia) - Wednesday, 22 April 2009, 21:11 GMT
What's become of this fix? I also have solved this problem locally by doing "ifconfig up" on the interface before checking association. Associations are not "real" until the radio power is turned on, so it makes sense to me to turn it on before trying to use the interface.
Comment by Rogutės (rogutes) - Wednesday, 20 May 2009, 19:45 GMT
Ping!
I have acquired a ZyDAS WLA-54L WiFi and it doesn't work with /etc/rc.d/network, because "ifconfig wlan0 up" has to be run for it to associate. Could someone please apply the patch, upon which everybody seemed to agree long ago?

--- network 2009-05-20 22:33:54.984146784 +0300
+++ /etc/rc.d/network 2009-05-20 22:34:33.589417597 +0300
@@ -38,6 +38,7 @@
eval iwcfg="\$wlan_${1}"
[ "$iwcfg" = "" ] && return 0

+ /sbin/ifconfig $iwcfg up
/usr/sbin/iwconfig $iwcfg
[[ -z "$WIRELESS_TIMEOUT" ]] && WIRELESS_TIMEOUT=2
sleep $WIRELESS_TIMEOUT
Comment by Hatem Nassrat (pykler) - Thursday, 04 June 2009, 15:12 GMT
@rogutes

I think the reason the patch you mentioned was not applied is that it causes "ifconfig wlan up" to be called more than once (exactly two times). This is somewhat of a hack. If you see the patch I posted a while back, it calls only modifies the order in which things happen and does not introduce such hacks.

I am not sure why its taking this long, due to this delay I have noticed that Arch maybe bleeding edge with respect to packages, but they are quite slow at actually fixing things :(. I have currently moved away from Arch, I may come back later to see if things have changed.
Comment by James Rayner (iphitus) - Saturday, 06 June 2009, 01:42 GMT
Hatem: There's nothing wrong with calling ifconfig multiple times, it's necessary. There is only one "ifconfig up" call here. Each ifconfig call has a different purpose. The first call "ifconfig up" brings the device up, the second one configures the device. dhcpcd only brings the device up if it is not already.

Many wireless cards will not begin authenticating and will not begin associating until the device has been brought up. Notably all the mac80211 based drivers. As a result the association check will always fail on these devices.

Have a look at this logfile from wpa_supplicant. Line 10. The first thing it does after loading it's configuration is to bring the device up. First bringing the device up is neccesary for association. And association is necessary for further configuration.

Phrakture or anyone with initscripts commit access, could you merge the attached patch?
Comment by Stephen (wingedsubmariner) - Saturday, 06 June 2009, 14:49 GMT
Here's another patch -- it does an ifconfig before the iwconfig just like in James's patch above, but only does it for wireless interfaces. In addition, it changes the AP check so it polls the interface every second, so it can end before the timeout and the boot can be faster. You can also set WIRELESS_TIMEOUT to -1 to bypass the check completely, for faster booting but potentially more confusing error messages down the road.

The patch is made with just a "diff oldfile newfile". Anyone know of a howto for making the git style patches?
Comment by Stephen (wingedsubmariner) - Saturday, 06 June 2009, 14:52 GMT
Sorry, wrote a bug into that patch. The attached file is correct.
Comment by James Rayner (iphitus) - Tuesday, 30 June 2009, 01:17 GMT
Patch looks fine, could somebody with initscripts commit access please merge the last attached patch (06/06/09)?
Comment by newgargamel (newgargamel) - Tuesday, 11 August 2009, 20:32 GMT
Was it merged? I've just tried the newest ISO and this problem still exist.
Comment by Aaron Griffin (phrakture) - Tuesday, 11 August 2009, 21:06 GMT
Sadly, this was not merged. I guess it got away from me a bit.

I merged in James' change (simply adding an ifconfig up at the top of the ifup function)

It will be in the next initscripts package
Comment by Hatem Nassrat (pykler) - Friday, 25 September 2009, 14:11 GMT
Any reason why calling ifconfig up twice was the right thing to do?
Comment by Aaron Griffin (phrakture) - Friday, 25 September 2009, 17:18 GMT
It's not called twice...

From git HEAD:
$ grep "\<up\>" network
/sbin/ifconfig $ifname up
# bring up bridge interfaces
# bring up ethernet interfaces
# bring up bond interfaces
# bring up routes

ifconfig is called twice, yes, but the "up" simply brings the interface up so we can associate properly
Comment by Paul Mattal (paul) - Sunday, 06 December 2009, 19:29 GMT
Wow. That was a long read. I think I'm more or less up to speed.

So has this acceptable fix been merged yet? It sounds like there was resolution, but the merging to initscripts might not have happened yet.
Comment by Paul Mattal (paul) - Wednesday, 06 January 2010, 05:04 GMT
I can't find anything that looks like this patch literally being merged into initscripts, but maybe the following commit by James handles this by bringing up the interface early on?

http://projects.archlinux.org/initscripts.git/commit/?id=ac3baddf04b62e4bb55f7a2d0d34d78191ac815d

If folks haven't been having this problem anymore since the 2009.08-1 initscripts release, can we call this fixed?
Comment by Paul Mattal (paul) - Monday, 25 January 2010, 13:40 GMT
I will close this on 2/6 (next bug day) if nobody writes in to say it still isn't working.

Loading...