FS#38912 - [linux] 3.13.x Complete X freeze with nouveau driver

Attached to Project: Arch Linux
Opened by Darren Davison (davison) - Friday, 14 February 2014, 15:07 GMT
Last edited by Doug Newgard (Scimmia) - Wednesday, 13 May 2015, 20:48 GMT
Task Type Bug Report
Category Upstream Bugs
Status Closed
Assigned To Tobias Powalowski (tpowa)
Thomas Bächler (brain0)
Architecture All
Severity High
Priority Normal
Reported Version
Due in Version Undecided
Due Date Undecided
Percent Complete 100%
Votes 6
Private No

Details

Description:
Machine freezes, no input from mouse/kbd accepted. Display remains frozen. Unable to change vt but can still SSH in and everything looks normal from the SSH session. Only a reboot fixes this, killing the X process causes screen damage and may allow kbd input briefly but then locks again.

Additional info:
Feb 14 14:23:45 hepburn kernel: BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
Feb 14 14:23:45 hepburn kernel: IP: [<ffffffffa010ca12>] nouveau_fence_wait_uevent.isra.1+0x22/0x440 [nouveau]
Feb 14 14:23:45 hepburn kernel: PGD 4197d9067 PUD 414332067 PMD 0
Feb 14 14:23:45 hepburn kernel: Oops: 0000 [#1] PREEMPT SMP
Feb 14 14:23:45 hepburn kernel: Modules linked in: vhost_net vhost macvtap macvlan tun fuse ipt_MASQUERADE iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntra
ck ipt_REJECT xt_CHECKSUM iptable_mangle xt_tcpudp bridge stp llc ip6table_filter ip6_tables iptable_filter ip_tables x_tables nfsv3 nfs_acl nfsv4 snd_hda_codec_hdmi snd_hda_codec_realtek x86_pk
g_temp_thermal intel_powerclamp crct10dif_pclmul crct10dif_common crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd joydev eeepc_wm
i uvcvideo asus_wmi hid_generic iTCO_wdt snd_usb_audio sparse_keymap videobuf2_vmalloc iTCO_vendor_support videobuf2_memops rfkill videobuf2_core snd_usbmidi_lib usbhid snd_rawmidi videodev snd_
seq_device hid media pcspkr sb_edac microcode edac_core
Feb 14 14:23:45 hepburn kernel: serio_raw i2c_i801 snd_hda_intel snd_hda_codec snd_hwdep snd_pcm e1000e snd_page_alloc snd_timer snd ptp mei_me soundcore pps_core shpchp mei lpc_ich processor e
vdev nfs lockd sunrpc fscache kvm_intel kvm dm_crypt dm_mod coretemp ext4 crc16 mbcache jbd2 usb_storage sd_mod sr_mod cdrom ata_generic pata_acpi ahci libahci ata_piix libata firewire_ohci ehci
_pci xhci_hcd ehci_hcd firewire_core scsi_mod crc_itu_t usbcore usb_common nouveau button video mxm_wmi wmi i2c_algo_bit drm_kms_helper ttm drm i2c_core
Feb 14 14:23:45 hepburn kernel: CPU: 4 PID: 906 Comm: X Not tainted 3.12.9-2-ARCH #1
Feb 14 14:23:45 hepburn kernel: Hardware name: System manufacturer System Product Name/P9X79, BIOS 1103 04/10/2012
Feb 14 14:23:45 hepburn kernel: task: ffff8804184d22c0 ti: ffff88040faf0000 task.ti: ffff88040faf0000
Feb 14 14:23:45 hepburn kernel: RIP: 0010:[<ffffffffa010ca12>] [<ffffffffa010ca12>] nouveau_fence_wait_uevent.isra.1+0x22/0x440 [nouveau]
Feb 14 14:23:45 hepburn kernel: RSP: 0018:ffff88040faf1c20 EFLAGS: 00010282
Feb 14 14:23:45 hepburn kernel: RAX: 0000000000000000 RBX: ffff8803b17d7628 RCX: 0000000000000001
Feb 14 14:23:45 hepburn kernel: RDX: 0000000000000001 RSI: ffff8803b17d7630 RDI: ffff8803b17d7628
Feb 14 14:23:45 hepburn kernel: RBP: ffff88040faf1ca0 R08: 000000000000038a R09: 000000000000e200
Feb 14 14:23:45 hepburn kernel: R10: ffffffffa0148c40 R11: ffff88040faf1de0 R12: 0000000000000001
Feb 14 14:23:45 hepburn kernel: R13: 0000000000000000 R14: ffff8804183bb060 R15: ffff8803b17d7630
Feb 14 14:23:45 hepburn kernel: FS: 00007fa4e1ee3880(0000) GS:ffff88042fd00000(0000) knlGS:0000000000000000
Feb 14 14:23:45 hepburn kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Feb 14 14:23:45 hepburn kernel: CR2: 0000000000000008 CR3: 000000040f320000 CR4: 00000000000427e0
Feb 14 14:23:45 hepburn kernel: Stack:
Feb 14 14:23:45 hepburn kernel: 0000000000000000 ffff88040faf1cf8 ffffffff814f47a6 ffff88040faf1de0
Feb 14 14:23:45 hepburn kernel: ffffffffa0148c40 000000000000e200 000000000000038a 0000000000000000
Feb 14 14:23:45 hepburn kernel: 0000000000000001 0000000000000001 0000000000000010 ffff8803b17d7600
Feb 14 14:23:45 hepburn kernel: Call Trace:
Feb 14 14:23:45 hepburn kernel: [<ffffffff814f47a6>] ? retint_kernel+0x26/0x30
Feb 14 14:23:45 hepburn kernel: [<ffffffffa010ceb6>] nouveau_fence_wait+0x86/0x1a0 [nouveau]
Feb 14 14:23:45 hepburn kernel: [<ffffffffa010eb15>] nouveau_bo_fence_wait+0x15/0x20 [nouveau]
Feb 14 14:23:45 hepburn kernel: [<ffffffffa0066911>] ttm_bo_wait+0x91/0x190 [ttm]
Feb 14 14:23:45 hepburn kernel: [<ffffffffa01141c7>] nouveau_gem_ioctl_cpu_prep+0x57/0xe0 [nouveau]
Feb 14 14:23:45 hepburn kernel: [<ffffffffa0011c62>] drm_ioctl+0x502/0x630 [drm]
Feb 14 14:23:45 hepburn kernel: [<ffffffff811bef08>] ? destroy_inode+0x38/0x60
Feb 14 14:23:45 hepburn kernel: [<ffffffff811ba1cf>] ? __d_free+0x3f/0x60
Feb 14 14:23:45 hepburn kernel: [<ffffffffa010a021>] nouveau_drm_ioctl+0x51/0x90 [nouveau]
Feb 14 14:23:45 hepburn kernel: [<ffffffff811b7375>] do_vfs_ioctl+0x2e5/0x4d0
Feb 14 14:23:45 hepburn kernel: [<ffffffff811a649e>] ? ____fput+0xe/0x10
Feb 14 14:23:45 hepburn kernel: [<ffffffff81081c04>] ? task_work_run+0xa4/0xe0
Feb 14 14:23:45 hepburn kernel: [<ffffffff811b75e1>] SyS_ioctl+0x81/0xa0
Feb 14 14:23:45 hepburn kernel: [<ffffffff814fbbed>] system_call_fastpath+0x1a/0x1f
Feb 14 14:23:45 hepburn kernel: Code: c3 0f 1f 84 00 00 00 00 00 66 66 66 66 90 55 48 89 e5 41 57 49 89 f7 41 56 41 55 41 54 41 89 d4 53 48 89 fb 48 83 ec 58 48 8b 07 <48> 8b 48 08 48 8b 91 f8 00 00 00 4c 8b b1 b0 07 00 00 48 8b 42
Feb 14 14:23:45 hepburn kernel: RIP [<ffffffffa010ca12>] nouveau_fence_wait_uevent.isra.1+0x22/0x440 [nouveau]
Feb 14 14:23:45 hepburn kernel: RSP <ffff88040faf1c20>
Feb 14 14:23:45 hepburn kernel: CR2: 0000000000000008
Feb 14 14:23:45 hepburn kernel: ---[ end trace 88aeef95088a57d3 ]---


Steps to reproduce:
Happens at random, cannot be tied to any particular steps
This task depends upon

Closed by  Doug Newgard (Scimmia)
Wednesday, 13 May 2015, 20:48 GMT
Reason for closing:  Fixed
Additional comments about closing:  linux 3.19.2-1
Comment by Gerardo Exequiel Pozzi (djgera) - Friday, 14 February 2014, 15:13 GMT
  • Field changed: Summary (Complete X freeze with nouveau driver → [linux] Complete X freeze with nouveau driver)
  • Field changed: Status (Unconfirmed → Waiting on Response)
  • Field changed: Category (Kernel → Upstream Bugs)
  • Field changed: Severity (Critical → High)
  • Task assigned to Thomas Bächler (brain0), Tobias Powalowski (tpowa)
upstream report?
Comment by Grisha Georgiev (gsg) - Sunday, 23 February 2014, 06:21 GMT
I have the same issue after kernel update to linux-3.13.4-1-x86_64.
I use integrated video GeForce 6100 / nForce 430 (NV40 family chipset: NV4E (C51)) with 128M shared ram.
Sometimes artifacts apear on screen after fresh boot to console and trying to start X freezes the system.
Sometimes after runing X for a while machine freezes.
Running OpenGL application freezes the system immediately.
No problems after downgrading to linux-3.12.9-2.
Comment by cirrus (cirrus) - Wednesday, 12 March 2014, 22:24 GMT
Im experienced this too, fully updated i686 boxxen, GPU:8800GT w/ dual DVI screens, sometimes 1 VDU would freeze and show a solid colour, although id have mouse ,keyboard input on VDU2, sometimes both VDU freeze and only a cold boot fixes ( it seems to happen after first boot of the day ) dmesg or xorg0.log don't show much relevent info. Ive since reverted to the blob.
Comment by Taylor Lookabaugh (Taylor) - Sunday, 16 March 2014, 06:58 GMT
This had happened to me the past couple days. At first I thought KDE was eating up all the memory, then realized it was the video card driver instead. I had originally thought it was a complete system freeze and not just a video display freeze.
Comment by Karl Yngve Lervåg (lervag) - Tuesday, 10 June 2014, 15:30 GMT
This happens to me once or twice every week. However, here it does not freeze. It only makes the x server restart, which implies that it crashes my current session and kills all running programs and services. Quite annoying. I don't remember when it started happening, but february this year sounds about right.

I'd be happy to help fix this, but I have no idea where to start.
Comment by Tobias Powalowski (tpowa) - Wednesday, 13 August 2014, 07:22 GMT
Status on 3.16?
Comment by Taylor Lookabaugh (Taylor) - Wednesday, 13 August 2014, 17:28 GMT
This has been fixed for me as of 3.14(or was it 3.15?) I believe.. Had something to do with the fan controls in the nouveau code getting stuck.

https://bugs.freedesktop.org/show_bug.cgi?id=76788
Comment by Christian Fillion (cfillion) - Saturday, 13 September 2014, 18:53 GMT
  • Field changed: Percent Complete (100% → 0%)
Not fixed: it happened again even with 3.16.1-1.

My log excerpt: http://sprunge.us/JYPg
Comment by Christian Fillion (cfillion) - Saturday, 08 November 2014, 02:14 GMT
Still happening with linux 3.17.2-1. I noticed it crashes almost exclusively when Ardour is in use.
Comment by Grisha Georgiev (gsg) - Wednesday, 25 March 2015, 08:56 GMT
For me the problem is solved with linux 3.19.2-1.

Loading...