FS#55744 - [linux] 4.13 hard freezes the computer with xorg and nouveau

Attached to Project: Arch Linux
Opened by Antonio Corbi bellot (acorbi) - Monday, 25 September 2017, 11:02 GMT
Last edited by Doug Newgard (Scimmia) - Saturday, 30 September 2017, 15:32 GMT
Task Type Bug Report
Category Kernel
Status Closed
Assigned To No-one
Architecture All
Severity Critical
Priority Normal
Reported Version
Due in Version Undecided
Due Date Undecided
Percent Complete 100%
Votes 5
Private No

Details

Description:
After upgrading to kernels 4.13.x (linux and linux-zen behave the same with respect to this error report) and booting with xdm enabled,
the computer hard freezes with a black screen (I've to push the power button to restart it) when the xdm login screen should be present.

This does not happen with gdm and any of those kernels, the computer boots ok and works as expected.
As a matter of fact xdm works ok with linux-lts.

Additional info:
* package version(s)
xorg-xdm: 1.1.11-6
linux, linux-zen: 4.13.3-1
linux-lts: 4.9.51-1
graphics card: NVIDIA Corporation GK107 [GeForce GT 640] (rev a1)

* config and/or log files etc.


Steps to reproduce:
This task depends upon

Closed by  Doug Newgard (Scimmia)
Saturday, 30 September 2017, 15:32 GMT
Reason for closing:  Duplicate
Additional comments about closing:   FS#55629 
Comment by loqs (loqs) - Monday, 25 September 2017, 11:24 GMT
If you disable xdm and use startx instead does that also cause a system lockup?
Comment by Antonio Corbi bellot (acorbi) - Monday, 25 September 2017, 11:24 GMT
I decided to test/try other login managers: lxdm, sddm.
The results are the same as with xdm. They fail with linux/linux-zen. So it's not an xdm problem.

It seems that the only login manager I can use with current linux/linux-zen is gdm.
Comment by Antonio Corbi bellot (acorbi) - Monday, 25 September 2017, 11:30 GMT
> If you disable xdm and use startx instead does that also cause a system lockup?

Yes, hard freeze.
I attach the Xorg.0.log that startx generated.
Comment by loqs (loqs) - Monday, 25 September 2017, 11:40 GMT
I suspect gdm is not affected by the same issue as it will default to using Wayland not X11.
At a guess does adding the boot parameter intel_iommu=off prevent the lock up?
Comment by Antonio Corbi bellot (acorbi) - Monday, 25 September 2017, 13:41 GMT
Hi! I don't have the machine with me ATM.
As soon as I get back to it I'll report.
Thank's for your help.
Comment by Doug Newgard (Scimmia) - Tuesday, 26 September 2017, 02:42 GMT
That log doesn't look like a hard freeze. Have you switched to another TTY?
Comment by Antonio Corbi bellot (acorbi) - Tuesday, 26 September 2017, 07:51 GMT
> At a guess does adding the boot parameter intel_iommu=off prevent the lock up?
Yes, with this parameter xdm/lxdm/etc... work as expected

> That log doesn't look like a hard freeze. Have you switched to another TTY?
It's the log I obtained stopping gdm from a console (sudo systemctl stop gdm) and executing startx from that same console.
Comment by loqs (loqs) - Tuesday, 26 September 2017, 14:31 GMT
Possibly  FS#55629  does intel_iommu=igfx_off also allow X to work as expected?
Comment by Antonio Corbi bellot (acorbi) - Wednesday, 27 September 2017, 07:58 GMT
> does intel_iommu=igfx_off also allow X to work as expected?

Yes, with this parameter it also works.
Is it preferred over intel_iommu=off then?
Comment by loqs (loqs) - Wednesday, 27 September 2017, 15:08 GMT
https://en.wikipedia.org/wiki/Input%E2%80%93output_memory_management_unit covers the advantages and disadvantages of using IOMMU.
With intel_iommu=igfx_off IOMMU is enabled except for the the the integrated GPU.
Comment by treeshateorcs (budkin) - Thursday, 28 September 2017, 09:05 GMT
I'm having the same issue with GDM. Can't do anything on the most recent stable kernel (4.13.3), can't switch to another TTY, nothing. Had to downgrade to 4.12, my hardware is a ThinkPad T450s with i7 (5600u) and intel hd 5500
Comment by treeshateorcs (budkin) - Thursday, 28 September 2017, 09:54 GMT
"intel_iommu=off" helps
Comment by Justus S (FlashG0rd0n) - Thursday, 28 September 2017, 10:05 GMT
same here with a ThinkPad T450s with i7 5600u and intel hd 5500. hard freeze.

journalctl say somethink like:
[drm:drm_atomic_helper_swap_state [drm_kms_helper]] *ERROR* [CRTC:32:pipe A] hw_done timed out
drm/i915: Resetting chip after gpu hang
[drm] GPU crash dump saved to /sys/class/drm/card0/error
[drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it.
[drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue.
[drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel
[drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace.
[drm] GPU HANG: ecode 8:0:0x85dffffb, in plasmashell [817], reason: Hang on rcs0, action: reset
DMAR: [DMA Read] Request device [00:02.0] fault addr bc2000 [fault reason 05] PTE Write access is not set
[...]
DMAR: DRHD: handling fault status reg 3

downgrade to 4.12.13-1 kernel is a temporary workaround.
Comment by Axel Kellermann (akellerm) - Friday, 29 September 2017, 21:26 GMT
Same on my Thinkpad T520 (i7-2620M, HD3000).

Error in my case is:
[drm:drm_atomic_helper_commit_cleanup_done [drm_kms_helper]] *ERROR* [CRTC:32:pipe A] flip_done timed out

I can work around the freezes by disabling the GPU IOMMU or downgrading to kernel 4.12.13-1.

Someone with appropriate permissions should probably change the headline of this bug, as it isn't specific to Nvidia hardware or the nouveau driver.
Comment by loqs (loqs) - Friday, 29 September 2017, 21:58 GMT
@FlashG0rd0n and @akellerm there is already  FS#55629  for issues when the GPU in question is using the i915 module.
As acorbi's system is not logging any output related to the issue before freezing it may or may not be  FS#55629 
Comment by alexander sanoll (sonix07) - Saturday, 30 September 2017, 10:50 GMT
I'm having a similar issue with two of my computers.
My workaround at the moment is to disable the VT-d (virtualization) feature in my bios. (no KVM at the moment)
My sandy brdige gave me a kernel panic straight away but my haswell started freezing randomly after boot.
The sandy bridge uses nvidia drivers but the haswell has just an intel graphics inside.
The kernel panic said something about IOMMU which is somehow related to VT-d.

Loading...