FS#36865 - [linux] 3.11 - recent commits against skge.c render ethernet broken

Attached to Project: Arch Linux
Opened by John (graysky) - Tuesday, 10 September 2013, 19:57 GMT
Last edited by Tobias Powalowski (tpowa) - Sunday, 22 September 2013, 17:59 GMT
Task Type Bug Report
Category Packages: Testing
Status Closed
Assigned To Tobias Powalowski (tpowa)
Thomas Bächler (brain0)
Architecture All
Severity High
Priority Normal
Reported Version
Due in Version Undecided
Due Date Undecided
Percent Complete 100%
Votes 2
Private No

Details

Bug: Linux kernel v3.11.0 for my 10-year-old machine (Asus A7N8X-E Deluxe) gives me a partially functional network. I do not see anything obvious in the dmesg output to indicate that a problem exists.

Bug fix: Reverting the last 3 commits against drivers/net/ethernet/marvell/skge.c which is contained in the attached patch fixes the issue. I reported this upstream but am unsure how rapidly it will be acted upon[1].

More info: In the bug state (i.e. using the unmodified 3.11 source):
What works:
*Netctl successfully bring up the interface
*I can ping internal IP addresses
*I can resolve domain names; wget can start a download from the AUR (aur.archlinux.org gets resolved to a numerical IP but the download
never starts).

What doesn't work:
*I cannot ping external addresses (names or numerical)
*I cannot ssh out or into the box
*I cannot receive data via wget which just hangs indefinitely.

1. http://marc.info/?l=linux-netdev&m=137884262727796&w=2

Additional info:
* package version(s) linux 3.11-1

Steps to reproduce:
Boot into 3.11-1-ARCH from [testing].
This task depends upon

Closed by  Tobias Powalowski (tpowa)
Sunday, 22 September 2013, 17:59 GMT
Reason for closing:  Fixed
Additional comments about closing:  3.11.1-2
Comment by John (graysky) - Friday, 13 September 2013, 19:56 GMT
Not sure what you guys want to do with this; I haven't heard anything from upstream 72 h now; the deadline to comment on 3.11.1-rc1 is fast approaching.
Comment by Tobias Powalowski (tpowa) - Saturday, 14 September 2013, 07:22 GMT
Bug them, I cannot force anything upstream.
Comment by John (graysky) - Saturday, 14 September 2013, 10:31 GMT
I emailed both devs mentioned in linux-3.11/MAINTAINERS section=netdev again. Could be that I am the only Archer using this driver or that this bug is only affecting me... guess we'll see when 3.11 goes into [core], eh?
Comment by John (graysky) - Saturday, 14 September 2013, 16:37 GMT
I opened a bugzilla in addition to the direct email/netdev mailing list post.[1]

1. https://bugzilla.kernel.org/show_bug.cgi?id=61291

Comment by Rostislav Krasny (rosti) - Thursday, 19 September 2013, 17:01 GMT
I've made a fix by different change, published by Francois Romieu. See attached file.
I propose to use this fix until this regression isn't fixed in the upstream.
I've tested it on my computer with an onboard 88E8001 NIC:

02:09.0 Ethernet controller [0200]: Marvell Technology Group Ltd. 88E8001 Gigabit Ethernet Controller [11ab:4320] (rev 13)
Comment by John (graysky) - Thursday, 19 September 2013, 18:59 GMT
Nice, thanks. I see actually there is a pretty robust discussion on this topic.[1][2] I will try this patch on my aged server and report back.

1. https://bugzilla.redhat.com/show_bug.cgi?id=1008323
2. https://lkml.org/lkml/2013/9/18/64
Comment by Rostislav Krasny (rosti) - Thursday, 19 September 2013, 19:26 GMT
Yes, there is other version of skge.c patch:
http://permalink.gmane.org/gmane.linux.network/284140
It might be better than patch from Francois Romieu or not better.
After all there are many people, I guess, who are waiting to any working fix.
Let's make it quickly. Unlike RedHat this bug has broken a _stable_ Arch branch.
Let's test new kernel release longer, next times.
Comment by John (graysky) - Thursday, 19 September 2013, 19:31 GMT
@Rostislav - Not sure which patch is better. I am testing the one you posted above now on several machines and will report back here shortly. To your point about more rigerous testing and just for the record, see my post #3 in this bug report regarding the stability of 3.11.x unpatched :p
Comment by John (graysky) - Thursday, 19 September 2013, 19:50 GMT
Tested the patch from post #5 on the following Arches/hardware and have experienced no panics or initial problems. I can also `ping -s 501 www.google.com` without error.

x86_64/skge: 1.14 addr 0xfcff8000 irq 19 chip Yukon rev 1
i686/skge: 1.14 addr 0xd5000000 irq 17 chip Yukon-Lite rev 7
i686/skge: 1.14 addr 0xdc800000 irq 10 chip Yukon-Lite rev 9
Comment by mbone (mbone) - Friday, 20 September 2013, 04:42 GMT
You're not the only one. It bit me too.

[edited to remove redundant urls]
Comment by Rostislav Krasny (rosti) - Friday, 20 September 2013, 19:55 GMT
Any progress in making official release of fixed kernel 3.11.1? Now is the time to be quick, not when you release a kernel with a known critical bug to the [core].

By the way. Mikulas Patocka has published on the upstream netdev mailing list a very similar patch to that of Francois Romieu, that I used in the 5th comment:

http://marc.info/?l=linux-netdev&m=137969961327188&w=2
Comment by John (graysky) - Friday, 20 September 2013, 20:02 GMT
I too would like to see it incorporated before 3.11.2-1 since upstreams hasn't even assembled this patchset yet for testing in 3.11.2-rc1, and since I can find no documentation that this patch has been accepted for inclusion yet.[1]

1, https://git.kernel.org/cgit/linux/kernel/git/stable/stable-queue.git/tree/queue-3.11/
Comment by Rostislav Krasny (rosti) - Saturday, 21 September 2013, 14:19 GMT
In the forum Allan stated that: "it is very rare that we pull in a patch that has not been accepted upstream".
https://bbs.archlinux.org/viewtopic.php?pid=1327416#p1327416
That topic is closed, so I'm asking here. Is it the reason no patch has been accepted as yet?
Do you think that waiting for the upstream and leaving users with not working network (or in some hardware configuration even with constant kernel panic) is better than making a temporary and already tested workaround?
Comment by John (graysky) - Saturday, 21 September 2013, 14:23 GMT
If the Arch devs do not want to use an unapproved patch in the kernel package, you can switch over the linux-lts series which has been bumped to 3.10.x series. This bug is not present currently in 3.10.12 at least while you wait for upstream to fix it.
Comment by Rostislav Krasny (rosti) - Saturday, 21 September 2013, 14:43 GMT
I use the patch from the 5th comment. So my system is in a good state. I asked about other users. Consider somebody with Marvell Yukon NIC is running 'pacman -Syu' right now. What user experience will he/she get?
Comment by John (graysky) - Sunday, 22 September 2013, 09:54 GMT
I would seem as though the original patch in post #5 has grown a bit.[1]

EDIT: Wait, I see that patches in the 3.12 tree and needs to be backported to 3.11...

1. http://permalink.gmane.org/gmane.linux.network/284277
Comment by Rostislav Krasny (rosti) - Sunday, 22 September 2013, 10:17 GMT
John, this is exactly the same email from Mikulas Patocka I've published a link to in the above comment:
https://bugs.archlinux.org/task/36865#comment114447
Comment by John (graysky) - Sunday, 22 September 2013, 10:24 GMT
@rosti - You are right; I missed that. I see that the Linus branch on github has been updated as well[1] but that code differs from your post in comment #5.

1. https://github.com/torvalds/linux/commit/c194992cbe71c20bb3623a566af8d11b0bfaa721
Comment by Rostislav Krasny (rosti) - Sunday, 22 September 2013, 11:59 GMT
John, this (in Linus branch) is the change that Mikulas Patocka saied isn't good. This is what he wrote:

"In my patch c194992cbe71c20bb3623a566af8d11b0bfaa721 I didn't fix the skge bug correctly"

c194992cbe71c20bb3623a566af8d11b0bfaa721 is also a part of the url you reference on.
Comment by John (graysky) - Sunday, 22 September 2013, 13:13 GMT
I am unable to get the patch you referenced[1] to apply cleanly to 3.11.1:
patching file drivers/net/ethernet/marvell/skge.c
Hunk #1 FAILED at 3086.
Hunk #2 succeeded at 3098 with fuzz 2 (offset -3 lines).
1 out of 2 hunks FAILED -- saving rejects to file drivers/net/ethernet/marvell/skge.c.rej

1. http://marc.info/?l=linux-netdev&m=137969961327188&w=2
Comment by John (graysky) - Sunday, 22 September 2013, 15:19 GMT
Ah, I see that it must be an iterative patch. In other words, first apply the part #1 and then apply part #2. That works. For simplicity's sake, I have attached three files:

1) The first patch (already committed to the Linus github).
2) The 2nd patch (designed to apply on top of the first I reckon).
3) A combined patch which is simply diffing 3.11.1 against 3.11.1-patched thus combining the two.

Do I have this right?

Loading...