FS#34563 - [linux] 3.8.x - 3.10.x kernel drm (radeon_drv.so xorg) crashes

Attached to Project: Arch Linux
Opened by Linas (Linas) - Monday, 01 April 2013, 21:48 GMT
Last edited by Tobias Powalowski (tpowa) - Tuesday, 17 September 2013, 10:00 GMT
Task Type Bug Report
Category Upstream Bugs
Status Closed
Assigned To Tobias Powalowski (tpowa)
Thomas Bächler (brain0)
Architecture All
Severity High
Priority Normal
Reported Version
Due in Version Undecided
Due Date Undecided
Percent Complete 100%
Votes 5
Private No

Details

Description:
I upgraded a lot of packages yesterday. This included the 1.13.3-1 -> 1.14.0-2 update of xorg-server{,-common} and other related packages (xf86-input-{evdev,keyboard,mouse} xf86-video-{ati,fbdev,vesa} libdrm mesa ati-dri...) as well as an upgrade of linux package 3.7.10-1 -> 3.8.4-1.

Today, xorg started crashing. It can work ok for hours and then suddenly crashes with SIGBUS, and then goes on crashing continuously when it is respawned (needing a reboot*).

I'm not sure if I should be blaming xorg or the kernel. The backtraces point to radeon_drv.so (package xf86-video-ati 1:7.1.0-3) I am using a Radeon HD 3600. It is also worth noting that the kernel was booted with radeon.no_wb=1 parameter.

On log files b and c, the crashes happened on radeon_drv.so called from AddScreen, while on a it was called from libexa.so (but also in xorg-server pkg)

It seemed to be a little more likely to happen with chromium opened, but it could be as well statistical noise. The crashes have happened after a few minutes but it has now been happily working for 3.5 hours without showing it.
This task depends upon

Closed by  Tobias Powalowski (tpowa)
Tuesday, 17 September 2013, 10:00 GMT
Reason for closing:  No response
Comment by ed tomlinson (edt) - Friday, 05 April 2013, 19:58 GMT
I am also on a radeon (hd67xx TURKS 0x1002:0x6759 0x174B:0xE193) and xorg 1.14.0-2 is not stable here either. Symptoms are very similar to this bug's description.

One new piece of information. Reverting to 1.13 (see next post) along with the dependencies gives me a stable box again so the kernel is probably okay.

I am update to date with testing disabled.
Comment by ed tomlinson (edt) - Friday, 05 April 2013, 20:05 GMT
With the following downgraded I have a stable system (1.13.2.901-2 is on another install)
warning: xf86-input-evdev: ignoring package upgrade (2.7.3-2 => 2.8.0-1)
warning: xf86-input-void: ignoring package upgrade (1.4.0-4 => 1.4.0-5)
warning: xf86-video-apm: ignoring package upgrade (1.2.5-2 => 1.2.5-3)
warning: xf86-video-ati: ignoring package upgrade (1:7.1.0-1 => 1:7.1.0-3)
warning: xf86-video-fbdev: ignoring package upgrade (0.4.3-2 => 0.4.3-3)
warning: xf86-video-v4l: ignoring package upgrade (0.2.0-11 => 0.2.0-12)
warning: xorg-server: ignoring package upgrade (1.13.3-1 => 1.14.0-2)
warning: xorg-server-common: ignoring package upgrade (1.13.3-1 => 1.14.0-2)
Comment by Linas (Linas) - Saturday, 06 April 2013, 15:30 GMT
That's funny, I had it workign fine for several days after downgrading linux 3.8.4-1 -> 3.7.10-1 on 2nd April. Finally, yesterday night I downgraded xorg and upgraded the kernel. Today xorg has already crashed three times. So it's seems to be the kernel here.

I opened https://bugzilla.kernel.org/show_bug.cgi?id=56311 upstream

This is what I did (and still crashes):
upgraded xorg-server-common (1.14.0-2 -> 1.13.3-1)
upgraded xf86-input-evdev (2.8.0-1 -> 2.7.3-2)
upgraded xorg-server (1.14.0-2 -> 1.13.3-1)
upgraded xf86-input-keyboard (1.7.0-1 -> 1.6.2-2)
upgraded xf86-input-mouse (1.9.0-1 -> 1.8.1-2)
upgraded xf86-video-ati (1:7.1.0-3 -> 1:7.1.0-1)
upgraded xf86-video-fbdev (0.4.3-3 -> 0.4.3-2)
upgraded xf86-video-vesa (2.3.2-3 -> 2.3.2-2)
upgraded linux (3.7.10-1 -> 3.8.4-1)
Comment by Jorge Villaseñor (salinasv) - Tuesday, 09 April 2013, 05:01 GMT
I have a similar problem (maybe not the same, can't tell). I can reproduce the problem every tme I run StarCraft in wine. Some times take a little while some other times it is just when start playing.

Before upgrading to xorg-server 1.14.0-2 I had a stable system.

I attach also some crashes I see in dmesg it looks like the GPU is stalling.
8324.678418] CE: hpet increased min_delta_ns to 20113 nsec
[11043.378969] radeon 0000:04:00.0: GPU lockup CP stall for more than 10000msec
[11043.378981] radeon 0000:04:00.0: GPU lockup (waiting for 0x00000000001598ad last fence id 0x00000000001598a0)

This bug looks related too:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/986524
Comment by Jorge Villaseñor (salinasv) - Wednesday, 10 April 2013, 03:36 GMT
I confirm, downgrading to xorg-server 1.13.3-1 "fixes" the problem. I guess something went wrong with Xorg.

from pacman.log:
[2013-04-09 00:09] [PACMAN] Running 'pacman -U xf86-input-evdev-2.7.3-2-x86_64.pkg.tar.xz xf86-input-keyboard-1.6.2-2-x86_64.pkg.tar.xz xf86-input-mouse-1.8.1-2-x86_64.pkg.tar.xz xf86-video-ati-1:7.1.0-1-x86_64.pkg.tar.xz xf86-video-fbdev-0.4.3-2-x86_64.pkg.tar.xz xf86-video-vesa-2.3.2-2-x86_64.pkg.tar.xz xorg-server-common-1.13.3-1-x86_64.pkg.tar.xz xorg-server-1.13.3-1-x86_64.pkg.tar.xz xorg-server-devel-1.13.3-1-x86_64.pkg.tar.xz'
[2013-04-09 00:10] [PACMAN] downgraded xf86-input-evdev (2.8.0-1 -> 2.7.3-2)
[2013-04-09 00:10] [PACMAN] downgraded xf86-input-keyboard (1.7.0-1 -> 1.6.2-2)
[2013-04-09 00:10] [PACMAN] downgraded xf86-input-mouse (1.9.0-1 -> 1.8.1-2)
[2013-04-09 00:10] [PACMAN] downgraded xf86-video-ati (1:7.1.0-3 -> 1:7.1.0-1)
[2013-04-09 00:10] [PACMAN] downgraded xf86-video-fbdev (0.4.3-3 -> 0.4.3-2)
[2013-04-09 00:10] [PACMAN] downgraded xf86-video-vesa (2.3.2-3 -> 2.3.2-2)
[2013-04-09 00:10] [PACMAN] downgraded xorg-server-common (1.14.0-2 -> 1.13.3-1)
[2013-04-09 00:10] [PACMAN] downgraded xorg-server (1.14.0-2 -> 1.13.3-1)
[2013-04-09 00:10] [PACMAN] downgraded xorg-server-devel (1.14.0-2 -> 1.13.3-1)
Comment by Jorge Villaseñor (salinasv) - Tuesday, 16 April 2013, 05:17 GMT
Ok, I have just got the same issue with the old xorg-server (1.13.3-1).

So it still crashes but much less often. I would like to get this to upstream, but I am not sure *who's* upstream problem is. Can you point me to the right direction?
Comment by Linas (Linas) - Tuesday, 16 April 2013, 12:34 GMT
I guess you're using linux-3.8.6-1, and not 3.7.10-1?

On https://bugzilla.kernel.org/show_bug.cgi?id=56311 Alex DEucher said it is a mesa bug in features enabled only on 3.8 kernels, pointing to https://bugs.freedesktop.org/show_bug.cgi?id=61182

That one seems a bit messy, but the “resources occupy a lot of memory” makes sense, as "using a lot of memory" seemed to play part on it.
Comment by Linas (Linas) - Wednesday, 17 April 2013, 22:38 GMT
For the record, it not only crashes in linux-3.8.6-1 but also in 3.8.7-1
Comment by Jorge Villaseñor (salinasv) - Friday, 19 April 2013, 18:07 GMT
Confirm. Yesterday I downgraded to linux 3.7.10-2 and played a good mount of time without an issue.
Comment by Linas (Linas) - Wednesday, 24 April 2013, 21:27 GMT
I recompiled mesa 9.1.1-1 reverting commit 35840ab189 (identified in the freedesktop bug as introducing the issue), and crashes still happen.
It's possible however that -as there were a couple of conflicts- I didn't revert it right, or that some later commit also creates crashes. But the modified mesa (packages mesa, mesa-libgl, ati-dri, intel-dri, nouveau-dri, svga-dri) still fails. I am attaching the mesa-git-fixes.patch I used (it's the same file that was in the package, with the commit with the revert appended).
Comment by Kai (freejack) - Wednesday, 01 May 2013, 06:04 GMT
Confirmed. I can reproduce the bug with Firefox 20.0.1 on a certain web page. Just scrolling down kills X instantly.

Firefox does not use GPU Accelerated Windows in this setup by default. By forcing it (layers.acceleration.force-enabled=true) the about:support reports accelerated windows, and Firefox runs stable. Very strange, just the other way round as one might expect...
Comment by Linas (Linas) - Thursday, 02 May 2013, 11:21 GMT
Kay, is that a page you can share?

PS: Still crashing with 3.8.10-1-ARCH
Comment by Kai (freejack) - Saturday, 04 May 2013, 03:13 GMT
Linas, it used to crash at the news page of whiskyexperts.at, which was quite long and had several embedded videos. They just rearranged their site and removed the news sub-pages (google still finds them). With the current version it's no longer possible to reproduce the X buserror.
Comment by josefnpat (josefnpat) - Sunday, 19 May 2013, 04:34 GMT
Same problem here, blows up whenever I try to do anything 3d.

Attached is a log with `startx 2> log.txt`

I will revert to 1.13 and report if this stabilizes my system.

update:

now using;

xf86-input-evdev-2.7.3-2-x86_64.pkg.tar.xz
xorg-server-1.13.3-1-x86_64.pkg.tar.xz
xorg-server-common-1.13.3-1-x86_64.pkg.tar.xz

But still can't run 3d :(

update:

So, I practically broke everything. Once I fixed everything, my 3d started working again.

I think it came down to the fact that I had `extra/xf86-input-mouse` installed, but I cannot verify, as I tried a lot of different things.
Comment by Tobias Powalowski (tpowa) - Thursday, 23 May 2013, 19:58 GMT
Status on 3.9?
Comment by Linas (Linas) - Sunday, 26 May 2013, 21:15 GMT
Sadly, still crashing on 3.9.3-1
Comment by Linas (Linas) - Tuesday, 25 June 2013, 10:39 GMT
Still failing on 3.9.6-1-ARCH
Comment by agapito fernandez (agapito) - Monday, 22 July 2013, 12:22 GMT
I have a 7950 card and i had the GPU stalling problem. I can always reproduce this bug when i select oxygen-gtk2 theme in skype. Desktop freezes 10 seconds. This happens with glamour acceleration enabled. If i don't use glamour, i don't have this problem.

Now I'm using packages from mesa-git repo, and I don't have this problem anymore with glamour enabled.
Comment by Tobias Powalowski (tpowa) - Tuesday, 30 July 2013, 10:34 GMT
Status on 3.10.x?

Loading...