Arch Linux

Please read this before reporting a bug:
https://wiki.archlinux.org/index.php/Reporting_Bug_Guidelines

Do NOT report bugs when a package is just outdated, or it is in Unsupported. Use the 'flag out of date' link on the package page, or the Mailing List.

REPEAT: Do NOT report bugs for outdated packages!
Tasklist

FS#63909 - [linux] 5.3 Random freezes

Attached to Project: Arch Linux
Opened by Daniel Holz (holzi) - Tuesday, 24 September 2019, 18:45 GMT
Task Type Bug Report
Category Packages: Core
Status Unconfirmed
Assigned To No-one
Architecture x86_64
Severity High
Priority Normal
Reported Version
Due in Version Undecided
Due Date Undecided
Percent Complete 0%
Votes 5
Private No

Details

Description:
Random freezes since 5.3. They vary from one to around 15 seconds. The hole system becomes unresponsive and the screen freezes. Reinstalling kernel 5.2.14 fixes the issues.
They appear on battery and ac. I tried disabling tlp but that did not help.


Additional info:
I added the output of journalctl for the timespan of the freezes.

Steps to reproduce:
I have just to use my computer. Sometimes they come earlier, sometimes only after days. When they appear the are triggered by file system operations or videos in chromium.
   10min.txt (430.6 KiB)
This task depends upon

Comment by Jolan Luff (jolan) - Thursday, 26 September 2019, 13:40 GMT
I'm seeing the same behavior with 5.3 and 5.3.1.

For me it is immediately triggered when under heavy i/o load, i.e. moving 15GB of data from one SATA SSD to another SATA SSD, or extracting large archives.

This is on a desktop machine (i7-4790k) using the integrated graphics. Will try to reproduce on my other machines.
Comment by Quido Meijer (Aapzak) - Sunday, 29 September 2019, 09:36 GMT
Same issue here. Switching back to LTS kernel solved it for me.

I'm using root on zfs btw, curious if other people with these freeze issues are using zfs too.

Comment by Daniel Holz (holzi) - Sunday, 29 September 2019, 09:48 GMT
No I'm using ext4.
Can someone try the zen kernel? I am using it for 5 days now and so far there were no freezes for me.
Comment by Quido Meijer (Aapzak) - Sunday, 29 September 2019, 09:56 GMT
Can't help you there. Entering pagerduty shift, I need my laptop to behave for a week.
Comment by Andy (freakyc) - Sunday, 29 September 2019, 11:00 GMT
I also had freezing with kernel 5.3 but it only seemed to be with some applications. Downgrading to kernel 5.2.14 also got rid of the freezing for me. My freezing was mostly noticeable using certain applications. Firefox specifically was really bad but it also happened under keepassxc. Chromium seemed to work fine.

Another user on the forums noted the same symptoms I was experiencing: https://bbs.archlinux.org/viewtopic.php?pid=1865957#p1865957
Comment by Christopher Dragt (verstandpasta) - Sunday, 29 September 2019, 20:54 GMT
Same here. 5 seconds of heavy I.O load (copying disk image from/to SSD) is enough to completely freeze my system.

I got it "fixed" by disabling any swap-file/partition. Changing swappiness settings didn't help.

Maybe this helps anyone, now running linux-lts because I need my swap space back...

Comment by Constantine (Hi-Angel) - Tuesday, 01 October 2019, 13:42 GMT
Same here, downgrade to 5.2 helped too. I think I have disabled swap at some point while running 5.3, but that didn't help.
Comment by Paulo Coelho (prscoelho) - Thursday, 03 October 2019, 18:58 GMT
Same issue here, periodic stutters while under a bit of load. Using xorg and intel graphics.

Tried disabling tlp, which didn't help and then disabling swap fixed it. There was nothing in journalctl while stutters were happening.
Comment by loqs (loqs) - Thursday, 03 October 2019, 20:46 GMT
Has anyone affected bisected the issue or tested 5.4-rc1?
Comment by Jolan Luff (jolan) - Sunday, 06 October 2019, 15:23 GMT
Seems to fixed by 5.3.4. Was looking through the changelog and I think this may be the commit that fixed it:

commit 1e04eb03877c3e0a38c1be1845be97074a1198b6
Author: Damien Le Moal <damien.lemoal@wdc.com>
Date: Wed Aug 28 13:40:20 2019 +0900

block: mq-deadline: Fix queue restart handling

commit cb8acabbe33b110157955a7425ee876fb81e6bbc upstream.

Commit 7211aef86f79 ("block: mq-deadline: Fix write completion
handling") added a call to blk_mq_sched_mark_restart_hctx() in
dd_dispatch_request() to make sure that write request dispatching does
not stall when all target zones are locked. This fix left a subtle race
when a write completion happens during a dispatch execution on another
CPU:

CPU 0: Dispatch CPU1: write completion

dd_dispatch_request()
lock(&dd->lock);
...
lock(&dd->zone_lock); dd_finish_request()
rq = find request lock(&dd->zone_lock);
unlock(&dd->zone_lock);
zone write unlock
unlock(&dd->zone_lock);
...
__blk_mq_free_request
check restart flag (not set)
-> queue not run
...
if (!rq && have writes)
blk_mq_sched_mark_restart_hctx()
unlock(&dd->lock)

Since the dispatch context finishes after the write request completion
handling, marking the queue as needing a restart is not seen from
__blk_mq_free_request() and blk_mq_sched_restart() not executed leading
to the dispatch stall under 100% write workloads.

Fix this by moving the call to blk_mq_sched_mark_restart_hctx() from
dd_dispatch_request() into dd_finish_request() under the zone lock to
ensure full mutual exclusion between write request dispatch selection
and zone unlock on write request completion.

Fixes: 7211aef86f79 ("block: mq-deadline: Fix write completion handling")
Cc: stable@vger.kernel.org
Reported-by: Hans Holmberg <Hans.Holmberg@wdc.com>
Reviewed-by: Hans Holmberg <hans.holmberg@wdc.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Comment by Constantine (Hi-Angel) - Monday, 07 October 2019, 10:37 GMT
> Seems to fixed by 5.3.4

I'm not so sure. I'm on 5.3.4 right now, and I had two 3-4 seconds freezes for a few hours. This looks better than before, but will see how it gonna work out later.

> I think I have disabled swap at some point while running 5.3, but that didn't help

Btw, yeah, disabling swap seems to fix it. It's reproducible with swap enabled.
Comment by Andy (freakyc) - Saturday, 02 November 2019, 11:11 GMT
I tried upgrading to 5.3.8 after seeing this but I was still seeing the same freezing. I noticed in the thread I linked to above that someone mentioned removing xf86-video-intel but I couldn't get X to start after I removed it. While I was trying to fix that, I came across the Intel Graphics - SNA Issues on the wiki (https://wiki.archlinux.org/index.php/Intel_graphics#SNA_issues) which also mentioned freezing issues. I switched to UXA and my freezing has stopped on 5.3.8 now.
Comment by moody tux (moodytux) - Thursday, 28 November 2019, 11:18 GMT
I can confirm this is still an issue with linux-5.3.8.1 . My machine freezes for about 15 seconds at different intervals, it seems to be when Qt based apps are loaded. During freezes, kswapd is at the top of the top list with high load averages of 6-8, before slowing becoming responsive and the load average reducing once again.

I tried disabling swap at runtime with 'swapoff /dev/sda3' which turned swap off but didn't fix the freezing. I downgraded to linux-5.2.9.arch1-1 (which was the latest 5.2 kernel I had in my package cache) and the problem has gone away, so it seems to be related to the 5.3 kernel. I run an nvidia card with the nouveau driver (and don't have the intel driver installed), so that isn't the cause of my problem.
Comment by Matt (madalu) - Wednesday, 18 December 2019, 18:21 GMT
System lockups during heavy io are still a problem for me on the 5.4.3 kernel. I'm using the mq-scheduler with an SSD (ext4 on encrypted lvm). In my case the lockups are directly correlated with heavy disk writes. They disappear entirely when using linux-lts. In other words, I tested using exactly the same conditions (rsyncing large files from an external drive). With 5.4.3, the system would lock up repeatedly during this operation (often for upwards of 1-2 seconds). With linux-lts, there are no lock-ups transfering the same files. I've also been noticing significant lock-ups on the 5.3 and 5.4 kernels when pacman writes to disk during an update. This is happening on multiple arch machines, each with very different hardware.
Comment by loqs (loqs) - Wednesday, 18 December 2019, 19:22 GMT
Is the issue also present using 5.5-rc2. Has the issue been reported upstream? [1]

[1] https://www.kernel.org/doc/html/latest/admin-guide/reporting-bugs.html
Comment by Matt (madalu) - Sunday, 22 December 2019, 01:06 GMT
I have not been able to replicate the issue I was experiencing while using kernel 5.5-rc2. The specific issue seemed to be lock-ups during heavy I/O when using an encrypted (dm-crypt) swap partition. That seems to have been solved in 5.5. But I can't speak for the other issues raised in this thread.

Loading...