FS#19351 - [network-ups-tools] 2.4.3-1 leaves nut dead
Attached to Project:
Community Packages
Opened by David C. Rankin (drankinatty) - Tuesday, 04 May 2010, 04:29 GMT
Last edited by Thorsten Töpper (Atsutane) - Tuesday, 21 September 2010, 19:14 GMT
Opened by David C. Rankin (drankinatty) - Tuesday, 04 May 2010, 04:29 GMT
Last edited by Thorsten Töpper (Atsutane) - Tuesday, 21 September 2010, 19:14 GMT
|
Details
Description:
After update to 2.6.33.3-1, network-ups-tools (nut/upsd) starts, but cannot communicate with itself on localhost or across the network anymore. I received the update to nut-2.4.3-1 with a normal pacman -Syu and I then updated abs and built nut using the default abs configuration and nut failed in the same way. Network-ups-tools starts fine, but then appears to disconnect (or not be able to connect) from the upses that it monitors. As a work-around to the problem, I dropped back to the abs network-ups-tools-2.4.1 PKGBUILD that I still had laying around and rebuilt it against the new kernel. It works just fine. So the problem looks like a problem in the latest network-ups-tools-2.4.3-1 (at least on both my x86_64 boxes) in the interim, I've just added --ignore network-ups-tools to my pacman shortcut so I don't keep pulling in the problem package. Here is what I get in my logs with the new packages: May 3 00:23:17 nirvana upsd[27067]: listening on 192.168.6.17 port 3493 May 3 00:23:17 nirvana upsd[27067]: listening on 127.0.0.1 port 3493 May 3 00:23:17 nirvana upsd[27068]: Startup successful May 3 00:23:17 nirvana upsmon[27070]: Startup successful May 3 00:23:17 nirvana upsmon[27072]: Communications with UPS archangel_ups@archangel.3111skyline.com lost May 3 00:23:17 nirvana wall[27074]: wall: user nut broadcasted 1 lines (70 chars) May 3 00:23:22 nirvana upsmon[27072]: UPS archangel_ups@archangel.3111skyline.com is unavailable May 3 00:23:22 nirvana upsmon[27072]: Communications with UPS nirvana_ups@localhost lost May 3 00:23:22 nirvana wall[27077]: wall: user nut broadcasted 1 lines (60 chars) May 3 00:23:22 nirvana wall[27079]: wall: user nut broadcasted 1 lines (52 chars) May 3 00:23:27 nirvana upsmon[27072]: UPS nirvana_ups@localhost is unavailable May 3 00:23:27 nirvana wall[27084]: wall: user nut broadcasted 1 lines (42 chars) May 3 00:28:23 nirvana upsmon[27072]: UPS archangel_ups@archangel.3111skyline.com is unavailable May 3 00:28:23 nirvana wall[27104]: wall: user nut broadcasted 1 lines (60 chars) May 3 00:28:28 nirvana upsmon[27072]: UPS nirvana_ups@localhost is unavailable May 3 00:28:28 nirvana wall[27108]: wall: user nut broadcasted 1 lines (42 chars) and every few minute or two, this gets dumped to the terminal: [00:25 nirvana:/home/david] # Broadcast message from nut (Mon May 3 00:28:23 2010): UPS archangel_ups@archangel.3111skyline.com is unavailable Broadcast message from nut (Mon May 3 00:28:28 2010): UPS nirvana_ups@localhost is unavailable and on the second Arch box [00:24 archangel:/home/david] # Broadcast message from nut (Mon May 3 00:28:41 2010): UPS archangel_ups@localhost is unavailable Broadcast message from nut (Mon May 3 00:28:41 2010): UPS nirvana_ups@nirvana.3111skyline.com is unavailable Additional info: I have two CyberPower AVRLCD 1000 ups'es that have always done fine. They use the usbhid-ups interface through /dev/hiddev0 and each monitor each other over the LAN. Steps to reproduce: pacman -Syu and then look for the errors above. I tried to get additional input from the mailing list before filing, but I had no luck. I've been through the config files and the pacnew files and configuration isn't an issue. I'm stumped. Let me know if you need any additional information, logs, whatever and I'll be glad to chase it down for you. P.S. - Don't scream. This is a community package and I can't find community in the 'Category' list at the time I filed. So if I'm not supposed to file this here, I apologize. Just let me know what I need to do if that is the case. Thanks. |
This task depends upon
Closed by Thorsten Töpper (Atsutane)
Tuesday, 21 September 2010, 19:14 GMT
Reason for closing: None
Additional comments about closing: Moved to AUR as we did not get it fixed, no maintainer: http://mailman.archlinux.org/pipermail/a ur-general/2010-September/010921.html
Tuesday, 21 September 2010, 19:14 GMT
Reason for closing: None
Additional comments about closing: Moved to AUR as we did not get it fixed, no maintainer: http://mailman.archlinux.org/pipermail/a ur-general/2010-September/010921.html
[09:54 nirvana:/home/david/arch/pkg] # tail -n400 /var/log/everything.log
May 18 09:15:41 nirvana upsd[1812]: listening on 192.168.6.17 port 3493
May 18 09:15:41 nirvana upsd[1812]: listening on 127.0.0.1 port 3493
May 18 09:15:41 nirvana upsd[1812]: Can't connect to UPS [nirvana_ups] (usbhid-ups-nirvana_ups): No such file or directory
May 18 09:15:41 nirvana upsd[1866]: Startup successful
May 18 09:15:41 nirvana upsmon[1898]: Startup successful
May 18 09:15:41 nirvana upsmon[1900]: Login on UPS [archangel_ups@archangel.3111skyline.com] failed - got [ERR ACCESS-DENIED]
(this is normal - not logging in)
May 18 09:15:41 nirvana upsmon[1900]: Login on UPS [nirvana_ups@localhost] failed - got [ERR ACCESS-DENIED]
(this is normal - not logging in)
May 18 09:15:46 nirvana upsmon[1900]: Poll UPS [nirvana_ups@localhost] failed - Driver not connected
May 18 09:15:46 nirvana upsmon[1900]: Communications with UPS nirvana_ups@localhost lost
May 18 09:15:51 nirvana upsmon[1900]: Poll UPS [nirvana_ups@localhost] failed - Driver not connected
May 18 09:15:51 nirvana upsmon[1900]: UPS nirvana_ups@localhost is unavailable
May 18 09:15:56 nirvana upsmon[1900]: Poll UPS [nirvana_ups@localhost] failed - Driver not connected
<snip a lot>
May 18 09:24:11 nirvana upsmon[1900]: Poll UPS [nirvana_ups@localhost] failed - Driver not connected
<snip many more>
May 18 09:25:41 nirvana upsd[1866]: Can't connect to UPS [nirvana_ups] (usbhid-ups-nirvana_ups): No such file or directory
May 18 09:26:01 nirvana upsmon[1900]: UPS nirvana_ups@localhost is unavailable
May 18 09:26:01 nirvana wall[2130]: wall: user nut broadcasted 1 lines (42 chars)
I don't understand the error:
May 18 09:25:41 nirvana upsd[1866]: Can't connect to UPS [nirvana_ups] (usbhid-ups-nirvana_ups): No such file or directory
Obviously there is no 'usbhid-ups-nirvana_ups', it's 'usbhid-ups', but why is the new package looking for 'usbhid-ups-nirvana_ups' to begin with?
Let me know if you need more information. Thanks.
Jun 3 10:02:34 dragon upsd[17155]: listening on 140.232.0.52 port 3493
Jun 3 10:02:34 dragon upsd[17155]: listening on 127.0.0.1 port 3493
Jun 3 10:02:34 dragon upsd[17155]: Can't connect to UPS [apcups] (usbhid-ups-apcups): No such file or directory
Jun 3 10:02:34 dragon upsd[17156]: Startup successful
Jun 3 10:02:34 dragon upsmon[17158]: Startup successful
Jun 3 10:02:34 dragon upsmon[17160]: Login on UPS [apcups@127.0.0.1] failed - got [ERR ACCESS-DENIED]
Jun 3 10:02:39 dragon upsmon[17160]: Poll UPS [apcups@127.0.0.1] failed - Driver not connected
Jun 3 10:02:39 dragon upsmon[17160]: Communications with UPS apcups@127.0.0.1 lost
Kernel version: kernel26 2.6.33.4-1
NUTS version: network-ups-tools 2.4.3-1
[1] With every version from 2.4.1 to 2.4.3:
# tail -f /var/log/everything.log|grep ups
Jul 7 19:03:38 hell upsd[9496]: listening on 127.0.0.1 port 3493
Jul 7 19:03:38 hell upsd[9496]: listening on ::1 port 3493
Jul 7 19:03:38 hell upsd[9496]: Can't connect to UPS [upsname] (powerpanel-upsname): No such file or directory
Jul 7 19:03:38 hell upsd[9497]: Startup successful
Jul 7 19:03:42 hell upsd[9497]: mainloop: Interrupted system call
Jul 7 19:03:42 hell upsd[9497]: Signal 15: exiting
Sorry for the delay. There are a couple of issue with nut that need to be fixed. Network-UPS-Tools 2.4.3 does NOT work at all with the CyberPower AVR1000LCD upses I have. They have worked perfectly with Network-UPS-Tools up to network-ups-tools 2.4.1-2. Ever since then NUT fails to communicate with the ups on Arch.
I have pulled ABS and built in many times. Same result it doesn't work. It doesn't matter where you put the parts (either /srv/http where it has traditionally been, or /usr/share/webapps which is it's new location [I don't like that location]), the bottom line is I can completely re-arrange my setup (httpd, cgi, and php.ini) to draw the html and cgi functions from /usr/share/webapps, but the NUT program will not even communicate with the UPS on localhost in 2.4.3
Downgrading to 2.4.1-2 fixes the problem every time.
I have contacted Arjen DeKorte who is the NUT dev upstream. Here are his comments:
Guys,
Below are the thoughts from the NUT developer on what to check in the Arch NUT package to see if recent changes to udev bus designations, and permission are incorporated. My guess is that it is udev related since 50% of the time it boots and loads usbhid-ups just fine and everything works, other times, it hangs loading usbhid-ups and nothing works. Hope this helps.
-------- Original Message --------
Subject: Re: [Nut-upsdev] usbhid-ups causing hang on boot with 2.6.35 - any ideas?
Date: Fri, 03 Sep 2010 10:49:27 +0200
From: Arjen de Korte <nut+devel@de-korte.org>
To: nut-upsdev@lists.alioth.debian.org
Citeren "David C. Rankin" <drankinatty>:
> (Arjen - I apologize if you get two copies, I sent the first one to
> the old de-korte.org address)
That address isn't old, but the mailserver is configured to only
accept messages from the mailinglist server.
> Aug 30 20:31:03 archangel upsd[2295]: listening on 192.168.6.14 port 3493
> Aug 30 20:31:03 archangel upsd[2295]: listening on 127.0.0.1 port 3493
> Aug 30 20:31:03 archangel upsd[2295]: Can't connect to UPS
> [archangel_ups] (usbhid-ups-archangel_ups): No such file or directory
This means the driver isn't running (but you probably already knew that).
> Aug 30 20:31:03 archangel upsd[2296]: Startup successful
> Aug 30 20:31:03 archangel upsmon[2298]: Startup successful
> Aug 30 20:31:03 archangel upsmon[2300]: Login on UPS
> [archangel_ups@localhost] failed - got [ERR ACCESS-DENIED]
> Aug 30 20:31:03 archangel upsmon[2300]: Login on UPS
> [nirvana_ups@nirvana.3111skyline.com] failed - got [ERR ACCESS-DENIED]
The above two error messages indicate there is a problem with logging
into the server. You could run the server in debug mode, but my guess
is this has to do with either NUT being build with tcp-wrappers
support and you failed setting this up properly (see 'man 8 upsd') or
the credentials you're using are incorrect.
[...]
> Looks like some type of udev issue or kernel issue, but I
> thought I would check here to see if you guys have any more info on
> the problem. Anybody seen this behavior before?
This looks like the udev rules are incorrect. If memory serves, there
have been some changes in the syntax (BUS was changed to SUBSYSTEMS).
Other than that, there are no known issues.
Best regards, Arjen
My guess is that it might be udev.
The next issue has to do with the usbhid-ups driver that hangs on load with 2.6.35.4-1. That stops the boot process completely until the 120 sec timeout occurs, the usbhid-ups module fails to load an no communication at all is capable with the UPS. Also, the usb mouse is dead when you reach the kdm login. Here are the dmesg errors:
udev: starting version 161
<snip>
usbcore: registered new interface driver usbfs
usbcore: registered new interface driver hub
usbcore: registered new device driver usb
<snip>
usbcore: registered new interface driver hiddev
<snip>
INFO: task modprobe:1567 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
modprobe D 0000000000000000 0 1567 1565 0x00000000
ffff88021fcf9a28 0000000000000082 ffff88022fc90000 ffff880200000000
0000000000014f40 0000000000014f40 ffff88021fcf9fd8 ffff88021fcf9fd8
ffff88021fcf9fd8 ffff880224a96f00 ffff88021fcf9fd8 0000000000014f40
Call Trace:
[<ffffffff81071b6b>] ? prepare_to_wait+0x5b/0x90
[<ffffffffa0283ba5>] usb_kill_urb+0x85/0xc0 [usbcore]
[<ffffffff810718d0>] ? autoremove_wake_function+0x0/0x40
[<ffffffffa02a3831>] usbhid_init_reports+0xb1/0x120 [usbhid]
[<ffffffffa02a3d53>] usbhid_start+0x4b3/0x5a0 [usbhid]
[<ffffffffa02506d8>] hid_device_probe+0x98/0xe0 [hid]
[<ffffffff812877ea>] ? driver_sysfs_add+0x5a/0x90
[<ffffffff81287ac6>] driver_probe_device+0x96/0x1c0
[<ffffffff81287c90>] ? __device_attach+0x0/0x60
[<ffffffff81287cdb>] __device_attach+0x4b/0x60
[<ffffffff812866a4>] bus_for_each_drv+0x64/0x90
[<ffffffff8128795f>] device_attach+0x8f/0xb0
[<ffffffff81287115>] bus_probe_device+0x25/0x40
[<ffffffff81284e5f>] device_add+0x4ff/0x5e0
[<ffffffffa02500a7>] hid_add_device+0x87/0x1b0 [hid]
[<ffffffffa02a14b9>] usbhid_probe+0x329/0x500 [usbhid]
[<ffffffffa0288a2b>] usb_probe_interface+0xfb/0x1f0 [usbcore]
[<ffffffff81287ac6>] driver_probe_device+0x96/0x1c0
[<ffffffff81287c8b>] __driver_attach+0x9b/0xa0
[<ffffffff81287bf0>] ? __driver_attach+0x0/0xa0
[<ffffffff812869fe>] bus_for_each_dev+0x5e/0x90
[<ffffffff81287789>] driver_attach+0x19/0x20
[<ffffffff81287297>] bus_add_driver+0xc7/0x2e0
[<ffffffff81287f01>] driver_register+0x71/0x140
[<ffffffffa02876f8>] usb_register_driver+0xb8/0x170 [usbcore]
[<ffffffffa01b6000>] ? hid_init+0x0/0xd1 [usbhid]
[<ffffffffa01b6093>] hid_init+0x93/0xd1 [usbhid]
[<ffffffff81002149>] do_one_initcall+0x39/0x1a0
[<ffffffff8108cefb>] sys_init_module+0xbb/0x200
[<ffffffff81009e82>] system_call_fastpath+0x16/0x1b
INFO: task modprobe:1567 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
modprobe D 0000000000000000 0 1567 1565 0x00000000
ffff88021fcf9a28 0000000000000082 ffff88022fc90000 ffff880200000000
0000000000014f40 0000000000014f40 ffff88021fcf9fd8 ffff88021fcf9fd8
ffff88021fcf9fd8 ffff880224a96f00 ffff88021fcf9fd8 0000000000014f40
Call Trace:
[<ffffffff81071b6b>] ? prepare_to_wait+0x5b/0x90
[<ffffffffa0283ba5>] usb_kill_urb+0x85/0xc0 [usbcore]
[<ffffffff810718d0>] ? autoremove_wake_function+0x0/0x40
[<ffffffffa02a3831>] usbhid_init_reports+0xb1/0x120 [usbhid]
[<ffffffffa02a3d53>] usbhid_start+0x4b3/0x5a0 [usbhid]
[<ffffffffa02506d8>] hid_device_probe+0x98/0xe0 [hid]
[<ffffffff812877ea>] ? driver_sysfs_add+0x5a/0x90
[<ffffffff81287ac6>] driver_probe_device+0x96/0x1c0
[<ffffffff81287c90>] ? __device_attach+0x0/0x60
[<ffffffff81287cdb>] __device_attach+0x4b/0x60
[<ffffffff812866a4>] bus_for_each_drv+0x64/0x90
[<ffffffff8128795f>] device_attach+0x8f/0xb0
[<ffffffff81287115>] bus_probe_device+0x25/0x40
[<ffffffff81284e5f>] device_add+0x4ff/0x5e0
[<ffffffffa02500a7>] hid_add_device+0x87/0x1b0 [hid]
[<ffffffffa02a14b9>] usbhid_probe+0x329/0x500 [usbhid]
[<ffffffffa0288a2b>] usb_probe_interface+0xfb/0x1f0 [usbcore]
[<ffffffff81287ac6>] driver_probe_device+0x96/0x1c0
[<ffffffff81287c8b>] __driver_attach+0x9b/0xa0
[<ffffffff81287bf0>] ? __driver_attach+0x0/0xa0
[<ffffffff812869fe>] bus_for_each_dev+0x5e/0x90
[<ffffffff81287789>] driver_attach+0x19/0x20
[<ffffffff81287297>] bus_add_driver+0xc7/0x2e0
[<ffffffff81287f01>] driver_register+0x71/0x140
[<ffffffffa02876f8>] usb_register_driver+0xb8/0x170 [usbcore]
[<ffffffffa01b6000>] ? hid_init+0x0/0xd1 [usbhid]
[<ffffffffa01b6093>] hid_init+0x93/0xd1 [usbhid]
[<ffffffff81002149>] do_one_initcall+0x39/0x1a0
[<ffffffff8108cefb>] sys_init_module+0xbb/0x200
[<ffffffff81009e82>] system_call_fastpath+0x16/0x1b
INFO: task usbhid-ups:2027 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
usbhid-ups D 0000000000000000 0 2027 1 0x00000000
ffff880224a6bb98 0000000000000082 ffff880224a6bae8 ffffffff00000000
0000000000014f40 0000000000014f40 ffff880224a6bfd8 ffff880224a6bfd8
ffff880224a6bfd8 ffff880224aec560 ffff880224a6bfd8 0000000000014f40
Call Trace:
[<ffffffff81371879>] __mutex_lock_slowpath+0x139/0x310
[<ffffffff81371a61>] mutex_lock+0x11/0x30
[<ffffffffa028d195>] usbdev_open+0x145/0x330 [usbcore]
[<ffffffff81127807>] chrdev_open+0x127/0x270
[<ffffffff811276e0>] ? chrdev_open+0x0/0x270
[<ffffffff81121d2b>] __dentry_open+0x11b/0x3c0
[<ffffffff811a263a>] ? security_inode_permission+0x1a/0x20
[<ffffffff81122f64>] nameidata_to_filp+0x54/0x70
[<ffffffff811311f0>] do_last+0x480/0x740
[<ffffffff8113169d>] do_filp_open+0x1ed/0x680
[<ffffffff8113cc74>] ? alloc_fd+0xf4/0x150
[<ffffffff81122fe4>] do_sys_open+0x64/0x130
[<ffffffff811230cb>] sys_open+0x1b/0x20
[<ffffffff81009e82>] system_call_fastpath+0x16/0x1b
Something is totally F'ed up with this usbhid-ups driver and the new kernel. 50% of the time, this error occurs. However, on the next boot, the usbhid-ups module loads fine. There is something very flaky in the way the kernel handles loading this. I have looked at this error and the normal load and I can't make heads or tails of it. The dmesg file for the failed load is attached as:
dmesg20100906.txt
The dmesg file showing the usbhid-ups driver loading without any problem is attached as:
dmesg20100906-1.txt
You will need to have a kernel and a udev guy look at this to see if we can figure out what is broken. Thank you for your help and I'll attached the files below.
dmesg20100906-1.txt (42.2 KiB)
hub 4-0:1.0: USB hub found
hub 4-0:1.0: 6 ports detected
usb 3-1: new low speed USB device using ohci_hcd and address 2
usbcore: registered new interface driver hiddev
usb 4-1: new low speed USB device using ohci_hcd and address 2
generic-usb 0003:0764:0501.0001: hiddev0,hidraw0: USB HID v1.10 Device [CPS UPS CP1000AVRLCD] on usb-0000:00:02.0-1/input0
usbcore: registered new interface driver usbhid
usbhid: USB HID core driver
input: Microsoft Microsoft IntelliMouse® Explorer as /devices/pci0000:00/0000:00:04.0/usb4/4-1/4-1:1.0/input/input4
generic-usb 0003:045E:0095.0002: input,hidraw1: USB HID v1.11 Mouse [Microsoft Microsoft IntelliMouse® Explorer] on usb-0000:00:04.0-1/input0
NOTE: There are NO config changes between the boot when the driver crashes and the boot when the driver load perfectly. It is literally a 50/50 chance of whether it boot correctly or crashes. We need to fix this first before we can sort out the 2.4.3 no communication with the ups issue. Thanks. Let me know what else you need.