FS#27938 - [udev] udev-177 breaks loading of some firmware

Attached to Project: Arch Linux
Opened by Olivier (litemotiv) - Saturday, 14 January 2012, 12:04 GMT
Last edited by Tom Gundersen (tomegun) - Thursday, 15 March 2012, 12:38 GMT
Task Type Bug Report
Category Packages: Testing
Status Closed
Assigned To Tom Gundersen (tomegun)
Architecture All
Severity High
Priority Normal
Reported Version
Due in Version Undecided
Due Date Undecided
Percent Complete 100%
Votes 14
Private No

Details

After upgrading to udev-177, modules that require firmware fail during boot-time. After boot has finished these modules can be loaded normally. System tested is stock Arch (no systemd) with [testing] enabled.

dmesg:

ieee80211 phy0: brcmsmac: fail to load firmware brcm/bcm43xx-0.fw
ieee80211 phy0: brcmsmac: Failed to find firmware usually in /lib/firmware/brcm

Similar report on mailing list: http://mailman.archlinux.org/pipermail/arch-dev-public/2012-January/022393.html
This task depends upon

Closed by  Tom Gundersen (tomegun)
Thursday, 15 March 2012, 12:38 GMT
Reason for closing:  Fixed
Additional comments about closing:  in testing
Comment by Tom Gundersen (tomegun) - Saturday, 14 January 2012, 13:06 GMT
By comparison, /lib/firmware/intel-ucode/06-0f-06 and /lib/firmware/iwlwifi-3945-2.ucode are successfully loaded for me.

It would be useful with more output from udev of a failure case, as well as confirmation that modprob'ing fixes the problem. Please set udev_log="debug" in /etc/udev/udev.conf, reboot, modprobe the relevant module and attach your dmesg.

@Pierres: I lost your dmesg, could you attach it here?
Comment by Olivier (litemotiv) - Saturday, 14 January 2012, 13:27 GMT
udev with debug and dmesg with failed firmware load and consequent manual modprobe.
Comment by Tom Gundersen (tomegun) - Saturday, 14 January 2012, 15:31 GMT
Kay just pointed me to the problem (kernel bug) and workaround (revert a udev patch). Please try this and report back if it works: http://dev.archlinux.org/~tomegun/.

For more info see: http://marc.info/?l=linux-netdev&m=132655490826765&w=2 .
Comment by Gerardo Exequiel Pozzi (djgera) - Saturday, 14 January 2012, 16:15 GMT
@Tom: Using udev-177-2-i686 does not fix the issue for me. (I tested using latest linux-3.2.1-1 and linux-lts-3.0.17-1).

Here is the relevant lines from everything.log after running: modprobe -r rtl8192se ; udevadm control --log-priority=debug ; udevadm trigger --action=add --property-match=PCI_ID=10EC:8172
Comment by Gerardo Exequiel Pozzi (djgera) - Saturday, 14 January 2012, 16:39 GMT
Aditional note: If I remove module manually (modprobe -r rtl8192se) and insert again (modprobe rtl8192se), with both udev-177-1 and udev-177-2 works.
Comment by KriK (krik) - Saturday, 14 January 2012, 19:14 GMT
Same problem with brcmsmac and firmware.
Comment by Keshav Amburay (the.ridikulus.rat) - Saturday, 14 January 2012, 19:19 GMT Comment by Tom Gundersen (tomegun) - Sunday, 15 January 2012, 00:11 GMT
Very odd that 177-2 did not fix this. Could anyone have a stab at bisecting the problem? We are now assuming that

"udevd: remove TIMEOUT= handling": 43d5c5f03645c4b842659f9b5bd0ae465e885e92

is the first bad commit. It would be very helpful if anyone could verify this by checking that the previous one is ok, and that this one is broken.
Comment by Gerardo Exequiel Pozzi (djgera) - Monday, 16 January 2012, 04:32 GMT
Today, I spent lost of hours bisecting and/or reseting to certain commit without any success. It a bit tedious due lots of changes between 175 and 176 in locations of source files and build time options. Sometimes each test I need to reboot the machine to be sure that its works, otherwise random things happens (even if terminate udev cleanly with cleanup-db and exit) and starting again and triggering uevents.

As results:
Works until this commit d1c13ddcb3497e0c6a37cd9bd117b74ef010bc15 + cherry-pick this other 86a0d004c397d985505c5204fe1fbecb340f6076.
Weird behaviour when build at 85eaf38c3b2fd70a8b01b72bbdb936c0a5944b3c + cherry-pick this other 86a0d004c397d985505c5204fe1fbecb340f6076 : rtl8192se+firmware is loaded but i915 modules is not loaded by udev.
(until this point I used build time options from udev-175)
I have package issues when I reached d1aacc0fa997a9757adc923792a6c17753d05084 (At this point I need build time options for 177 due previous commit).

I will experiment a bit more tomorrow...

Comment by Gerardo Exequiel Pozzi (djgera) - Monday, 16 January 2012, 18:54 GMT
For d1aacc0fa997a9757adc923792a6c17753d05084, I need to set --libexecdir=/lib/udev (this is changed ahead in 1495719024cae87fd42167ec77691ebb64ba20d7 to --libexecdir=/lib).

Today I build udev at this commit 86a0d004c397d985505c5204fe1fbecb340f6076, without success, this is: few modules are loaded, for example no rtl8192se is loaded, no i915...
Comment by Gerardo Exequiel Pozzi (djgera) - Monday, 16 January 2012, 19:03 GMT
I am thinking that maybe this weird behaviour is because I am using latest kmod and at such commit needs an older version...
Comment by Tom Gundersen (tomegun) - Monday, 16 January 2012, 19:04 GMT
@Gerardo: Thanks for your work, sorry that there is no resolution yet. We already discussed this on IRC, but for others reading this report, can you verify that resetting to just before the TIMEOUT commit does not solve the firmware problem?
Comment by Gerardo Exequiel Pozzi (djgera) - Wednesday, 18 January 2012, 07:08 GMT
News here: there is a recent commit in udev [#1], finally firmware is loaded but there is a long delay while "Waiting for uevents...". I can see the effect of the patch:

udevd[170]: worker [196] timeout, kill it
udevd[170]: seq 954 '/devices/pci0000:00/0000:00:1c.1/0000:07:00.0' killed
udevd[170]: worker [196] terminated by signal 9 (Killed)

http://git.kernel.org/?p=linux/hotplug/udev.git;a=commit;h=e64fae5573e566ce4fd9b23c68ac8f3096603314
Comment by Tom Gundersen (tomegun) - Wednesday, 18 January 2012, 11:12 GMT
Please try udev-177-3 from testing.

@Gerardo: do you see any warnings/error messages in your log if you go back to udev_log=error? I.e., will people who experience the timeout be able to tell what it's all about without turning on more logging?
Comment by Fabien Devaux (fdev31) - Wednesday, 18 January 2012, 12:18 GMT
I experience the described problem with brcmsmac driver and udev-177-3 is not fixing it.
Unloading + reloading it manually works around the problem.
Comment by Tom Gundersen (tomegun) - Wednesday, 18 January 2012, 12:45 GMT
@Gerardo: Could you verify that udev-177-3 works for you (in other words, that I did not mess up the package).

@Fabien: Do you have your wifi module in your initramfs? 'lsinitcpio /boot/<your initramfs>.img' will tell you. If so, you need to also have udev there (and make sure you have done 'mkinitcpio -p linux' after upgrading to udev-177-3). Could you set udev_log="debug" in /etc/udev/udev.conf and attach your dmesg + logs?
Comment by Fabien Devaux (fdev31) - Wednesday, 18 January 2012, 13:26 GMT
Yes, the whole firmware folder but the .txt file are in the initramfs, I added it via the FILES= entry and I didn't forgot the mkinitcpio.
I'm not sure where to find the udev-specific logs, here is the dmesg and the boot log:

http://pastebin.com/v3TLNWVz
http://pastebin.com/3VThfsKa

Just point me the udev log file, since it's missing (my screen was flooded with it on boot so I can't tell if something went wrong or not)...
Comment by Gerardo Exequiel Pozzi (djgera) - Wednesday, 18 January 2012, 15:08 GMT
177-3 works for me (rtl8192se). As expected, there is a timeout of 30 secods, after this, such messages are displayed on console (with default udev_log=err).
Comment by Gerardo Exequiel Pozzi (djgera) - Wednesday, 18 January 2012, 15:37 GMT
@Fabien: looks like pastebin links are dead, please use "Attach a file" here.
Comment by Shawn Nock (nocko) - Wednesday, 18 January 2012, 20:25 GMT
177-3 fixes rtl8192ce as well
Comment by Fabien Devaux (fdev31) - Wednesday, 18 January 2012, 21:41 GMT
Sorry for the delay...
   boot (578.2 KiB)
   dmesg.txt (45.1 KiB)
Comment by Tom Gundersen (tomegun) - Wednesday, 18 January 2012, 22:04 GMT
@Fabien: could I ask you to try with UDEV_TIMEOUT=120 in rc.conf? We set it to 30, which is too short, I will remove it completely and let it default to upstream's 120.

Edit: corrected 180 to 120, either would work, but it is 120 that is the upstream default.
Comment by Tom Gundersen (tomegun) - Thursday, 19 January 2012, 14:37 GMT
@Fabien (and anyone else still experiencing problems with the most recent udev from testing):

Could you please verify/deny if a new initscripts package fixes your issues: http://dev.archlinux.org/~tomegun/initscripts-2012.01.3-1-any.pkg.tar.xz.

If so I'll push that to testing asap.
Comment by Fabien Devaux (fdev31) - Thursday, 19 January 2012, 18:02 GMT
Changed the timeout solved my issue.
Now how can I do to boot as fast as before ? :P
Using dichotomy on the timeout value ? :/
Comment by Tom Gundersen (tomegun) - Thursday, 19 January 2012, 18:34 GMT
@Fabien: Awesome! Thanks for reporting back.

When the timeout is hit this means udev is encountering a broken kernel module, this will have to be fixed in the kernel (and afaik it is being fixed).

If you want to have a temporary workaround I guess it would be possible to blacklist your wireless module in /etc/modprobe.d and modprobe it manually in rc.local, or something like that. But really, that is just broken, we should rather fix the problem in the kernel.

Edit: fixed my broken suggestion for a workaround.
Comment by ajs124 (andy123) - Tuesday, 21 February 2012, 22:15 GMT
  • Field changed: Percent Complete (100% → 0%)
still occurs to me with udev 181-2:
Thu Feb 16 19:13:17 2012: udevd[203]: worker [214] terminated by signal 9 (Killed)
Comment by ajs124 (andy123) - Tuesday, 21 February 2012, 23:38 GMT
seems that the problem is the rtl8192ce driver for my wifi card in my thinkpad e325.
it looks like my problem was closer to https://bugs.archlinux.org/task/28154
i fixed it with this workaround: http://igordcard.blogspot.com/2012/01/waiting-for-udev-events-to-be-processed.html
Comment by Nuno (quarkup) - Monday, 12 March 2012, 00:22 GMT
Found the same problem, on a TOSHIBA L650-11F (maybe bluetooth stuff?)

running udev version 181-2
http://pastebin.com/gA9pNcA0

uname -a:
Linux alpha.archlinux 3.2.9-1-ARCH #1 SMP PREEMPT Thu Mar 1 09:31:13 CET 2012 x86_64 Intel(R) Core(TM) i5 CPU M 480 @ 2.67GHz GenuineIntel GNU/Linux

lspci and lsusb
http://pastebin.com/KH0KGeVt
Comment by Tom Gundersen (tomegun) - Tuesday, 13 March 2012, 23:23 GMT
A workaround for some is to add the offending module to MODULES. If that does not work, you can blacklist the offending module and load it manually in /etc/rc.local.

It looks like this is not going to get fixed unless someone with the right hardware is able to produce and test a fix, as the obvious fix that I have proposed does not seem to work, and I'm out of ideas.
Comment by Will Brokenbourgh (willbprog127) - Tuesday, 13 March 2012, 23:34 GMT
@tomegun: That is a solution that has been suggested many times and does not work for me. Thanks for closing my report.
Comment by Tom Gundersen (tomegun) - Tuesday, 13 March 2012, 23:54 GMT
@willbprog: this is the first i hear of blacklisting the module and modprobing manually not working. What output do you get from "modprobe -v <modulename>"?

Also, no offence meant by closing your bug. We currently have 6 reports of the same issue (even though it manifests slightly differently), and I'm struggling to keep track of all the comments. The old reports can be found in "Related Tasks" above.
Comment by Will Brokenbourgh (willbprog127) - Wednesday, 14 March 2012, 01:31 GMT
@tomegun: Doing a verbose manual modprobe gives no output...the module is already loaded. Also, wasn't so much offense at it being closed, just really irritated that ridiculously obscure things like exercise USB dongles are being included in the kernel and something important, like these driver issues, is being ignored.
Comment by Tom Gundersen (tomegun) - Wednesday, 14 March 2012, 02:09 GMT
@willbprog127: i need some more info to go on. Please blacklist the module, remove any manual modprobe, and increase udevs logging level (in /etc/udev/udev.conf). Reboot your system, and verify that the module is not loaded. Then start "udevadm monitor" in one console and "modprobe -v" your module in a different console. Please post here the output from udevadm monitor, modprobe as well as dmesg and /var/log/messages during the call to modprobe. Verify that the module is now loaded, and whether or not the device works.

This issue is not being ignored. I have personally spent several full days on it (none of my six computers are affected by the problem, so it is hard for me to test). As you can see above Gerardo spent a lot of time bisecting the problem. The udev maintainer has also spent a lot of time dealing with this, including raising awareness among the kernel developers. Some kernel developers have already made fixes for some modules, and the netdev maintainer has requested someone to step up and take over the ipw2200 module for the purpose of solving this problem. And that's just the ones I happen to know about.

You should keep in mind that different people with different skills work on different parts of the kernel, so even if we could force the USB dongle developers to change project (which we can't), they probably could not fix whatever we want fixing. I might as well say to you: provide a working and tested udev or kernel patch and I'll see to it that it will be applied.
Comment by Will Brokenbourgh (willbprog127) - Wednesday, 14 March 2012, 03:06 GMT
@tomegun: I'm sorry, I think you misunderstood me, but anyway...

(did the best I could to find the right timeframe in the attached logs)
Comment by Tom Gundersen (tomegun) - Wednesday, 14 March 2012, 12:58 GMT
@willbprog127: thanks for the feedback!

Hm, as you get "link becomes ready", it looks like this is not the same issue. In fact, I don't see any error messages from NetworkManager or similar, so where does it go wrong?

Could you revert to the latest working version of udev and post the same logs?
Comment by Will Brokenbourgh (willbprog127) - Wednesday, 14 March 2012, 16:02 GMT
@tomegun: My issue with udev hanging and networking not working was *before* I switched to NetworkManager. I was using just using wpa_supplicant and network daemon. When I would blacklist brcmsmac and then modprobe it in rc.local it wouldn't work, not even after 'settling down' time. Blacklisting and modprobing in rc.local *does* work properly with NetworkManager if I start NM in rc.local after the modprobe.
Comment by Tom Gundersen (tomegun) - Wednesday, 14 March 2012, 16:29 GMT
@willbprog127: ok. thanks for getting back to me. This is then a different issue.

The network daemon will need to be restarted after the module has loaded successfully, if that does not work it is a bug in the network daemon. NetworkManager should be able to deal with this even if it is started from DAEMONS and not restarted later, if it does not it is a NM bug. If you want you can open separate reports against initscripts/NetworkManager regarding this.
Comment by Will Brokenbourgh (willbprog127) - Wednesday, 14 March 2012, 16:35 GMT
@tomegun: Okay, thanks for the information. My original bug report, however, was because udev was/is hanging on loading of my brcmsmac driver, is there anything else I should report about that, or are the kernel net devs not going to fix this, or...?
Comment by Tom Gundersen (tomegun) - Wednesday, 14 March 2012, 17:02 GMT
@willbprog127: it seems the maintainer is working on it: http://article.gmane.org/gmane.linux.kernel.wireless.general/85716
Comment by Will Brokenbourgh (willbprog127) - Wednesday, 14 March 2012, 21:58 GMT
@tomegun: Thanks for the info. Hopefully they resolve this soon. I'd really like to use Arch on my laptop without workarounds, if at all possible. :)
Comment by Tom Gundersen (tomegun) - Thursday, 15 March 2012, 01:37 GMT
I tried adding a workaround to udev-181-5, currently in testing. Feedback would be appreciated:

1) update to udev-181-5
2) regenerate your initramfs: "minitcpio -p linux"
3) disable any workarounds you might have enabled
4) reboot

Please report back if this works or not for you, and what kernel module you are using. Ideally I'd like to hear one report for each affected module, so I know if this works "not at all", "only for some" or "for everyone".
Comment by Will Brokenbourgh (willbprog127) - Thursday, 15 March 2012, 02:40 GMT
@tomegun: Followed your instructions exactly, and now my Broadcom BCM43225 (brcmsmac) card works and udev-181-5 did NOT hang on boot :)
Comment by Gerardo Exequiel Pozzi (djgera) - Thursday, 15 March 2012, 03:37 GMT
rtl8192se works without hang and is UP with udev-181-5 (manual module loading via rc.conf, removed)
Comment by Antonio Muñoz (tomby) - Thursday, 15 March 2012, 07:15 GMT
@tomegun: It works here too with ipw2200. Thanks! :)
Comment by Tom Gundersen (tomegun) - Thursday, 15 March 2012, 12:25 GMT
Thanks for the feedback guys! Finally this problem is solved :-)

Loading...