FS#63870 - [linux][wireguard] causes system freeze after latest updates
Attached to Project:
Arch Linux
Opened by mike (mbalajew) - Saturday, 21 September 2019, 02:23 GMT
Last edited by Christian Hesse (eworm) - Tuesday, 08 October 2019, 07:43 GMT
Opened by mike (mbalajew) - Saturday, 21 September 2019, 02:23 GMT
Last edited by Christian Hesse (eworm) - Tuesday, 08 October 2019, 07:43 GMT
|
Details
System freezes after running `wg-quick up`
Additional info: linux 5.3.arch1-1 wireguard-tools 0.0.20190913-1 wireguard-arch 0.0.20190913-2 Steps to reproduce: 1. run `wg-quick up <some wireguard conf file>` 2. system doesn't always freeze immediately, sometimes it takes a few minutes. |
This task depends upon
Closed by Christian Hesse (eworm)
Tuesday, 08 October 2019, 07:43 GMT
Reason for closing: Fixed
Additional comments about closing: linux 5.3.4.arch1-1
Tuesday, 08 October 2019, 07:43 GMT
Reason for closing: Fixed
Additional comments about closing: linux 5.3.4.arch1-1
[Fri Sep 20 20:53:23 2019] wireguard: WireGuard 0.0.20190913 loaded. See www.wireguard.com for information.
[Fri Sep 20 20:53:23 2019] wireguard: Copyright (C) 2015-2019 Jason A. Donenfeld <Jason@zx2c4.com>. All Rights Reserved.
[Fri Sep 20 20:53:38 2019] dst_release: dst:00000000ac66f553 refcnt:-1
[Fri Sep 20 20:53:38 2019] dst_release: dst:00000000ac66f553 refcnt:-2
[Fri Sep 20 20:53:38 2019] dst_release: dst:00000000ac66f553 refcnt:-3
[Fri Sep 20 20:53:38 2019] dst_release: dst:00000000ac66f553 refcnt:-4
[Fri Sep 20 20:53:38 2019] dst_release: dst:00000000ac66f553 refcnt:-5
[Fri Sep 20 20:53:38 2019] dst_release: dst:00000000ac66f553 refcnt:-6
[Fri Sep 20 20:53:38 2019] dst_release: dst:00000000ac66f553 refcnt:-7
[Fri Sep 20 20:53:38 2019] dst_release: dst:000000001b776db6 refcnt:-1
[Fri Sep 20 20:53:38 2019] BUG: kernel NULL pointer dereference, address: 0000000000000000
[Fri Sep 20 20:53:38 2019] #PF: supervisor read access in kernel mode
[Fri Sep 20 20:53:38 2019] #PF: error_code(0x0000) - not-present page
[Fri Sep 20 20:53:38 2019] PGD 0 P4D 0
[Fri Sep 20 20:53:38 2019] Oops: 0000 [#1] PREEMPT SMP PTI
[1] https://www.wireguard.com/#contact-the-team
Also can you try applying the fix from https://lore.kernel.org/netdev/20190919171236.111294-1-edumazet%40google.com/
because until wireguard is merged into the kernel it is officially an unsupported module that taints the kernel.
Alternately bisecting between 5.2 and 5.3 should locate the causal commit that could then be reported upstream.
You could try limiting the bisect to the path net to try speeding the process up.
Edit:
To save possible wasted effort how did you apply the patch from the mailing list?
prepare() {
cd $_srcname
sed -i "321s/|/\&/g" net/ipv6/ip6_fib.c
...
git clone --single-branch -b packages/linux https://projects.archlinux.org/svntogit/packages.git
and then modified the PKBUILD in the repos/core-x86_64 directory.
git revert d64a1f574a2957b4bcb06452d36cc1c6bf16e9fc
git revert -m 1 7d30a7f6424e88c958c19a02f6f54ab8d25919cd
patch of the diff attached. If you are already bisecting ignore this.
On Gentoo I'm experiencing the same crash. Applying your tmp.diff fixed the bug for me. Wireguard with IPv6 looks to be working fine now. Thanks.
What should we do to report this upstream?
d64a1f574a29 ipv6: honor RT6_LOOKUP_F_DST_NOREF in rule lookup logic
7d30a7f6424e Merge branch 'ipv6-avoid-taking-refcnt-on-dst-during-route-lookup' #merge contains the commits below so can be ignored when reverting the commits one by one
74109218b051 ipv6: initialize rt6->rt6i_uncached in all pre-allocated dst entries
7d9e5f422150 ipv6: convert major tx path to use RT6_LOOKUP_F_DST_NOREF
0e09edcce7ad ipv6: introduce RT6_LOOKUP_F_DST_NOREF flag in ip6_pol_route()
67f415dd2906 ipv6: convert rx data path to not take refcnt on dst
Updated tmp.diff without 67f415dd2906 to check if that is the cause.
@mbalajew - 5.3.1 does not contain any of the changes from tmp.diff https://cdn.kernel.org/pub/linux/kernel/v5.x/ChangeLog-5.3.1
I observed that too: workes great on a company wifi and cable, crashes at home and in hackerspace.
@loqs:
So far I have compiled kernel with debugging symbols and run gdb on it as written here: https://www.kernel.org/doc/html/latest/admin-guide/bug-hunting.html
And this is what gdb told me:
(gdb) l *fib6_rule_action+0xda
0xffffffff819e6cba is in fib6_rule_action (./include/net/ip6_fib.h:212).
207 for (rt = (w)->leaf; rt; \
208 rt = rcu_dereference_protected(rt->fib6_next, 1))
209
210 static inline struct inet6_dev *ip6_dst_idev(struct dst_entry *dst)
211 {
212 return ((struct rt6_info *)dst)->rt6i_idev;
213 }
214
215 static inline void fib6_clean_expires(struct fib6_info *f6i)
216 {
(gdb) l *fib6_rule_action+0xe0
0xffffffff819c8490 is in fib6_rule_action (./include/net/ip6_fib.h:212).
207 for (rt = (w)->leaf; rt; \
208 rt = rcu_dereference_protected(rt->fib6_next, 1))
209
210 static inline struct inet6_dev *ip6_dst_idev(struct dst_entry *dst)
211 {
212 return ((struct rt6_info *)dst)->rt6i_idev;
213 }
214
215 static inline void fib6_clean_expires(struct fib6_info *f6i)
216 {
Reduce the number of reverted commits again now only three commits reverted.
Please test if the latest tmp.diff still works.
I also reported my findings on #wireguard on freenode and zx2c4 is looking into it too.
So can we mark this resolved? Wireguard works for me right now on 5.3.5