FS#19369 - [kernel26] tg3 crash under high load

Attached to Project: Arch Linux
Opened by Sebastian Köhler (kart0ffelsack) - Wednesday, 05 May 2010, 21:27 GMT
Last edited by Jan de Groot (JGC) - Monday, 12 July 2010, 07:57 GMT
Task Type Bug Report
Category Kernel
Status Closed
Assigned To Tobias Powalowski (tpowa)
Thomas Bächler (brain0)
Architecture All
Severity Medium
Priority Normal
Reported Version
Due in Version Undecided
Due Date Undecided
Percent Complete 100%
Votes 0
Private No

Details

Whenever my network card is under high load after a few minutes the card stops sending an receiving packets. I have to manually bring the interface down and up again to make it work. "demesg" gives this output:

WARNING: at net/sched/sch_generic.c:255 dev_watchdog+0x20b/0x220()
Hardware name: Studio 1537
NETDEV WATCHDOG: eth0 (tg3): transmit queue 0 timed out
Modules linked in: jfs ext3 jbd ext2 fuse vboxnetflt vboxdrv snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device usblp snd_hda_codec_atihdmi snd_pcm_oss snd_mixer_oss snd_hda_codec_idt dell_wmi fglrx(P) rtc_cmos lib80211_crypt_tkip wl(P) usb_storage rtc_core rtc_lib uvcvideo videodev sdhci_pci sdhci v4l1_compat lib80211 mmc_core coretemp snd_hda_intel firewire_ohci firewire_core snd_hda_codec snd_hwdep video output crc_itu_t led_class battery wmi snd_pcm ac snd_timer snd soundcore snd_page_alloc thermal button dell_laptop joydev rfkill radeon ttm drm_kms_helper cpufreq_ondemand drm i2c_algo_bit tg3 libphy acpi_cpufreq psmouse uhci_hcd freq_table intel_agp agpgart serio_raw iTCO_wdt sg iTCO_vendor_support evdev dcdbas ehci_hcd usbcore i2c_i801 i2c_core processor ext4 mbcache jbd2 crc16 aes_i586 aes_generic xts gf128mul dm_crypt dm_mod sr_mod cdrom sd_mod ahci libata scsi_mod
Pid: 0, comm: swapper Tainted: P 2.6.33-ARCH #1
Call Trace:
[<c1043b4d>] warn_slowpath_common+0x6d/0xa0
[<c125595b>] ? dev_watchdog+0x20b/0x220
[<c125595b>] ? dev_watchdog+0x20b/0x220
[<c1043bc6>] warn_slowpath_fmt+0x26/0x30
[<c125595b>] dev_watchdog+0x20b/0x220
[<c105c56f>] ? insert_work+0x5f/0xd0
[<c12d27a5>] ? _raw_spin_unlock_irqrestore+0x25/0x30
[<c105cb41>] ? __queue_work+0x31/0x40
[<c1050dee>] run_timer_softirq+0x12e/0x2f0
[<c106ece6>] ? tick_do_broadcast+0x36/0x70
[<c1255750>] ? dev_watchdog+0x0/0x220
[<c104a4bd>] __do_softirq+0x8d/0x1d0
[<c109648c>] ? handle_IRQ_event+0x4c/0x190
[<c1066e8e>] ? sched_clock_tick+0x5e/0x90
[<c1099054>] ? move_native_irq+0x14/0x50
[<c104a63d>] do_softirq+0x3d/0x50
[<c104a9fd>] irq_exit+0x6d/0x70
[<c1005b00>] do_IRQ+0x50/0xc0
[<c100a378>] ? sched_clock+0x8/0x10
[<c1066b0b>] ? sched_clock_local+0xab/0x1a0
[<c1003cb0>] common_interrupt+0x30/0x38
[<c106007b>] ? sys_clock_settime+0x6b/0xa0
[<f84e2173>] ? acpi_idle_enter_bm+0x255/0x286 [processor]
[<c12290ba>] cpuidle_idle_call+0x7a/0x120
[<c10020b4>] cpu_idle+0x84/0xd0
[<c12ccba2>] start_secondary+0x1bd/0x1c3
---[ end trace 8c331e076b17dbaf ]---
tg3: eth0: transmit timed out, resetting
tg3: DEBUG: MAC_TX_STATUS[0000000b] MAC_RX_STATUS[00000006]
tg3: DEBUG: RDMAC_STATUS[00000000] WDMAC_STATUS[00000000]
tg3: tg3_stop_block timed out, ofs=2c00 enable_bit=2
tg3: tg3_stop_block timed out, ofs=1400 enable_bit=2
tg3: tg3_stop_block timed out, ofs=c00 enable_bit=2
tg3: tg3_stop_block timed out, ofs=4800 enable_bit=2

This is the card I am using:

# lspci -v
08:00.0 Ethernet controller: Broadcom Corporation NetLink BCM5784M Gigabit Ethernet PCIe (rev 10)
Subsystem: Dell Device 029f
Flags: bus master, fast devsel, latency 0, IRQ 32
Memory at fc100000 (64-bit, non-prefetchable) [size=64K]
Capabilities: [48] Power Management version 3
Capabilities: [40] Vital Product Data
Capabilities: [60] Vendor Specific Information: Len=6c <?>
Capabilities: [50] MSI: Enable+ Count=1/1 Maskable- 64bit+
Capabilities: [cc] Express Endpoint, MSI 00
Capabilities: [100] Advanced Error Reporting
Capabilities: [13c] Virtual Channel
Capabilities: [160] Device Serial Number 00-22-19-ff-fe-e7-02-25
Capabilities: [16c] Power Budgeting <?>
Kernel driver in use: tg3
Kernel modules: tg3

Additional info:
* kernel26: 2.6.33.3-1

Steps to reproduce:
1. Get Laptop with this card
2. Produce high load on eth0
This task depends upon

Closed by  Jan de Groot (JGC)
Monday, 12 July 2010, 07:57 GMT
Reason for closing:  Fixed
Additional comments about closing:  2010-07-04: A task closure has been requested. Reason for request: I just ran a short test and it looks like the error is gone in the new kernel. Please reopen if necessary.
Comment by Dan McGee (toofishes) - Friday, 07 May 2010, 13:58 GMT
Have you looked around upstream at all to see if others have hit this?
Comment by Sebastian Köhler (kart0ffelsack) - Friday, 07 May 2010, 16:45 GMT
Yes, I googled a lot there are similiar bug reports for other distributions, like
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/525046 and
https://bugzilla.redhat.com/show_bug.cgi?id=552288
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/294092
http://lists.mandriva.com/bugs/2010-04/msg02080.php

there is also an upstream report:
http://marc.info/?l=linux-netdev&m=125438760016456&w=2

but all those reports have one thing in common... they don't offer a solution.

I also made a report at upstream myself:
https://bugzilla.kernel.org/show_bug.cgi?id=15922
Comment by Gerardo Exequiel Pozzi (djgera) - Saturday, 03 July 2010, 18:20 GMT
  • Field changed: Summary (tg3 crash under high load → [kernel26] tg3 crash under high load)
any news with 2.6.34?
Comment by Sebastian Köhler (kart0ffelsack) - Sunday, 04 July 2010, 08:00 GMT
> any news with 2.6.34?

I just ran a short test and it looks like the error is gone in the new kernel.

Loading...