FS#20200 - [kernel26] KMS bug: Compiz white-screens with 2.6.34 - works great with LTS 2.6.32

Attached to Project: Arch Linux
Opened by David C. Rankin (drankinatty) - Saturday, 17 July 2010, 07:13 GMT
Last edited by Andreas Radke (AndyRTR) - Saturday, 07 August 2010, 18:49 GMT
Task Type Bug Report
Category Packages: Core
Status Closed
Assigned To Tobias Powalowski (tpowa)
Thomas Bächler (brain0)
Andreas Radke (AndyRTR)
Architecture All
Severity High
Priority Normal
Reported Version
Due in Version Undecided
Due Date Undecided
Percent Complete 100%
Votes 1
Private No

Details

Description:

I still can't get compiz to quit white-screening when running the 2.6.34 kernels. Compiz and the entire graphics system on my laptop works perfectly with the LTS kernel, but when I upgraded to the 2.6.34 kernels, compiz white-screens due to a bug somewhere in the kernel/KMS/xf86-video-ati interplay. I use the xf86-video-ati module.

The lspci -vv info for my system is:

http://www.3111skyline.com/dl/Archlinux/bugs/Toshiba205d-lspcivv.txt

The dmidecode info is:

http://www.3111skyline.com/dl/Archlinux/bugs/toshiba205d-dmidecode.txt

Starting with the 2.6.32 kernels, everything was GREAT with this laptop and the graphic response had never been so good on the 'radeon' driver -- it was kicking ass! But with the 2.6.34 kernel, something is badly broken.

Now 'compositing' works fine on the box. I can toggle compositing on/off in Gnome and it is fine. There is nothing compiz specific in Xorg.0.log that is helpful (but see below for the massive differences between 2.6.34 and LTS Xorg.0.log files). .xsession-error catches a couple of errors when compiz tries to start:

Window manager warning: Invalid WM_TRANSIENT_FOR window 0x111 specified for 0x5e0017b (Configure ).
Window manager warning: Invalid WM_TRANSIENT_FOR window 0x111 specified for 0x5e004cd (Configure ).
Window manager warning: Invalid WM_TRANSIENT_FOR window 0x111 specified for 0x5e0099b (Configure ).
* Detected Session: gnome
* Searching for installed applications...
** Message: pygobject_register_sinkfunc is deprecated (GtkWindow)
** Message: pygobject_register_sinkfunc is deprecated (GtkInvisible)
** Message: pygobject_register_sinkfunc is deprecated (GtkObject)
compiz (core) - Error: Plugin 'text' not loaded.

compiz (shift) - Warn: No compatible text plugin loaded.
Window manager warning: Received a _NET_WM_MOVERESIZE message for 0x4800021 (CompositeT); these messages lack timestamps and therefore suck.
Window manager warning: Received a _NET_WM_MOVERESIZE message for 0x4800021 (CompositeT); these messages lack timestamps and therefore suck.
Window manager warning: Received a _NET_WM_MOVERESIZE message for 0x4800021 (CompositeT); these messages lack timestamps and therefore suck.
Window manager warning: Received a _NET_WM_MOVERESIZE message for 0x4800021 (CompositeT); these messages lack timestamps and therefore suck.
Window manager warning: Invalid WM_TRANSIENT_FOR window 0x111 specified for 0x5800006 (Session Ch).

but I can't really decipher them. The compiz (core) and compiz (shift) lines are normal, I get them when starting compiz under LTS also and compiz starts fine.

Where there are MASSIVE differences is in the initialization of my video card in the Xorg.0.log files. The LTS Xorg.0.log is twice a large as the Xorg.0.log file from the 2.6.34 kernel startup. Much of it is just gibberish to me, but I'm sure the bug lies in the way the 2.6.34 kernel is trying to initialize my gpu. I have saved copies of each file for review:

2.6.34.1 kernel (35k):
http://www.3111skyline.com/dl/Archlinux/bugs/compiz/34.1/gnome/Xorg.0.log

LTS-kernel (62k)
http://www.3111skyline.com/dl/Archlinux/bugs/compiz/lts/gnome/Xorg.0.log

Obviously something in the way initialization for my video card is done changed from 2.6.32 - 2.6.34 that introduced a bug that is breaking compiz. I look at all the logs, but most is just gibberish. I simply don't know what I'm looking for. However, I'm happy to help in any way you need or provide any information that will help. Just let me know what you want and I'll be happy to be your hands on the keyboard at this end.

Thanks.

This task depends upon

Closed by  Andreas Radke (AndyRTR)
Saturday, 07 August 2010, 18:49 GMT
Reason for closing:  Fixed
Comment by Andreas Radke (AndyRTR) - Saturday, 17 July 2010, 09:09 GMT
try a .35rc kernel if it's fixed meanwhile. if not ask on the upstream radeon list and then probably file the issue to the Xorg tracker for radeon.
Comment by David C. Rankin (drankinatty) - Saturday, 17 July 2010, 23:57 GMT
OK, I'll give that a try.

For completeness here, I have more data that might help or at least be coordinated in one place for reference by the radeon folks.

I have another spare drive for my laptop with Arch installed on it and it is still at 2.6.33.4. My video card initializes without any problem, but the performance with compiz sucks compared to the current 2.6.32 LTS kernel. Compiz works fine, it's just much slower than with the current LTS setup. So in these 3 data sets, we are bound to be able to identify: (1) what changed; (2) where; and (3) what problems the changes have caused with regard to the RS690M installed in my laptop.

I have captured the dmesg output, lspci -vv, dmidecode, Xorg.0.log and .xsession-errors for the 2.6.33.4 kernel setup. They are available here:

dmesg (34k):
http://www.3111skyline.com/download/Archlinux/bugs/compiz/33.4-1/gnome/dmesg-2.6.33.4.txt

lspci -vv (28k):
http://www.3111skyline.com/download/Archlinux/bugs/compiz/33.4-1/gnome/lspci-vv-2.6.33.4.txt

dmidecode (8k):
http://www.3111skyline.com/download/Archlinux/bugs/compiz/33.4-1/gnome/dmidecode_2.6.33.4.txt

Xorg.0.log (27k):
http://www.3111skyline.com/download/Archlinux/bugs/compiz/33.4-1/gnome/Xorg.0.log

.xsession-errors (7k):
http://www.3111skyline.com/download/Archlinux/bugs/compiz/33.4-1/gnome/xsession-errors


Comment by David C. Rankin (drankinatty) - Sunday, 18 July 2010, 00:31 GMT
Andreas - where do I get the kernel? I'll also ask on the list. Thanks.
Comment by David C. Rankin (drankinatty) - Sunday, 18 July 2010, 20:49 GMT
I built 2.6.35rc (it built fine, all default values were used except the Ath9k modules were activated (I have an Atheros wireless card).

Bad news, the kernel boots to what I guess is the KMS switch (that part where your screen flips from normal text mode to the higher resolution graphics mode) and then:

(1) a 3-4 pixel horizontal white line flashes across the 'center' of the screen for approximately 1-2 seconds and; then

(2) the screen goes completely black (dead unpowered black) and the box hardlocks.

So the 2.6.35rc test was a bust. Do you think there is any use in recompiling and not enabling the Ath9k module will make any difference. Let me know. I guess the next step is to get with the radeon upstream folks and get them involved because the kernel/atiRS690M problem has gone from:

good (2.6.32 kernel); to

bad (2.6.33 kernel - compiz works but slow); to

worse (2.6.34 only boots less than 20% of the time and compiz whitescreens every time); to

completely unsuable (2.6.35rc kernel - won't boot at all)

Let me know if you think recompiling w/o ath9k will make any difference and if any of you have contacts on the radeon upstream list, pass them along so I at least have an idea of who has any Arch knowledge on that list.

Any other thoughts let me know, because it is apparent something has to be fixed before 2.6.35 moves forward.
Comment by Andreas Radke (AndyRTR) - Monday, 19 July 2010, 04:41 GMT
I see "vesafb" in your dmesg output?! What's your kernel append line in grub/lilo? Don't append any framebuffer/graphics settings!

fb: conflicting fb hw usage radeondrmfb vs VESA VGA - removing generic driver

The kernel says it in clear words to you.
Comment by David C. Rankin (drankinatty) - Monday, 19 July 2010, 06:07 GMT
I removed everything from the kernel line and I re--setup KMS early on both 2.6.34 and 2.6.35rc, and I recompiled 2.6.35rc without Ath9k.

There was NO difference in behavior.

2.6.34 Configured with KMS Early - gets to the point where it does the KMS flip and then I get spaghetti / Call Trace ... [lots of junk....] all down the screen.

2.6.35rc Configured with KMS Early - gets to KMS flip and hardlocks the box.

I'll go removed radeon from mkinitcpio.conf and re-configure for KMS late and try that once more, but with any KMS early, the current kernel and the rc kernel will NOT boot.

Comment by David C. Rankin (drankinatty) - Monday, 19 July 2010, 06:11 GMT
Just FYI, LTS kernel vesa info is:

vesafb: framebuffer at 0xf0000000, mapped to 0xffffc90001f00000, using 7776k, total 16384k
vesafb: mode is 1152x864x32, linelength=4608, pages=3
vesafb: scrolling: redraw
vesafb: Truecolor: size=0:8:8:8, shift=0:16:8:0
Console: switching to colour frame buffer device 144x54
fb0: VESA VGA frame buffer device
Linux agpgart interface v0.103

Just to make sure I'm not screwing something up, here are the kernel lines I'm using:

# (0) Arch Linux
title Arch Linux
root (hd0,0)
kernel /vmlinuz26 root=/dev/disk/by-uuid/b004715c-1666-458a-b827-2bbb1d4a735e ro radeon.modeset=1
initrd /kernel26.img

<snip fallback>

# (2) Arch Linux
title Arch Linux-lts
root (hd0,0)
kernel /vmlinuz26-lts root=/dev/disk/by-uuid/b004715c-1666-458a-b827-2bbb1d4a735e ro vga=0x356
initrd /kernel26-lts.img

<snip fallback>

# (4) Arch Linux
title Arch Linux-rc
root (hd0,0)
kernel /vmlinuz26-rc root=/dev/disk/by-uuid/b004715c-1666-458a-b827-2bbb1d4a735e ro radeon.modeset=1
initrd /kernel26-rc.img

I'm not using KMS early on LTS...
Comment by David C. Rankin (drankinatty) - Monday, 19 July 2010, 07:10 GMT
I re-configured everything for KMS late in both 2.6.34 and 2.6.35rc - no change.

2.6.34 - booted, but graphics performance was terrible, compiz whitescreened, just as it had when I opened this bug. I suspect the vga= line was just preventing the subsequent boots, because when I un-did KMS early and booted 2.6.34, it behaved like it did right after the kernel install (well - come to think about it, it should have booted this time because I had just rebuilt the initramfs, if it boots again, then the vga= line was what was locking the boot on subsequent boots)

2.6.35rc - Makes no difference using KMS late - gets to the "Loading Modules" line and hardlocks the box.

Can you think of any more information/files/etc. that I could look at and post that might be helpful to you? Right now I have 2.6.34, LTS and 2.6.35rc on the box, so let me know if you have more tests you want me to try.

Thansk.
Comment by Andreas Radke (AndyRTR) - Monday, 19 July 2010, 07:41 GMT
kernel /vmlinuz26-rc root=/dev/disk/by-uuid/b004715c-1666-458a-b827-2bbb1d4a735e ro

the kernel append line should only look like this. radeon.modeset=1 is now the default and no more nessecary since .34 kernels. please only use kms late mode to no run into mkinitcpio issues.

please make sure you add all needed modules for redeon in front of it in your rc.conf's modules line. my notebook also often failed to boot because the agp module wasn't loaded early enough.

if you are sure the modules are all loaded properly and kms late mode comes up check dmesg/everything.log.running Xorg and its speed is something different. first make sure the kernel drm module does its work.
Comment by Andreas Radke (AndyRTR) - Tuesday, 27 July 2010, 10:36 GMT
no comments? is it solved?
Comment by David C. Rankin (drankinatty) - Wednesday, 04 August 2010, 04:44 GMT
  • Field changed: Percent Complete (100% → 0%)
Problem Continues in 2.6.34-2
Comment by Andreas Radke (AndyRTR) - Wednesday, 04 August 2010, 22:21 GMT
How's .35 kernel in testing?
Comment by David C. Rankin (drankinatty) - Saturday, 07 August 2010, 06:56 GMT
Hey Andreas,

Here is a brief update. I'll have to try .35 in testing on Saturday. Today I installed the new production kernel-2.6.34.2-2-x86_64. The hardlock on boot is *worse*. I cannot even boot the 1st time after the initramfs is created. Something is really fscked up. I have also talked with other users on the suse list and there are several laptops there that won't boot the 2.6.34 either.

From testing, I installed: kernel26-2.6.35-2 kernel26-headers-2.6.35-2 madwifi-0.9.4.4133-1 and I am surprised. I am running the .35-2 kernel and at least for the first boot after the kernel install (new initramfs) the box booted with 'debug' added to the kernel line:

01:51 alchemy:~> uname -a
Linux alchemy 2.6.35-ARCH #1 SMP PREEMPT Wed Aug 4 12:28:25 CEST 2010 x86_64 AMD Turion(tm) 64 X2 Mobile Technology TL-58 AuthenticAMD GNU/Linux

What's more - compiz started!! Now this isn't to say it won't fail to boot on my next attempt, but that will have to wait until the a.m. (my eyes are burning and I'm exhausted) Thinks are definitely looking up compared to the 34.2-2 kernel. I'll post more tomorrow.
Comment by David C. Rankin (drankinatty) - Saturday, 07 August 2010, 16:00 GMT
Argh! I hate this. It's fixed and I don't know what was wrong to begin with!!

Ok, on the other hand -- it is fixed in .35. I have been able to boot successively and compiz works. What was the issue?
Comment by Andreas Radke (AndyRTR) - Saturday, 07 August 2010, 18:48 GMT
not sure. probably one of the drm fixes in .35 series solved for you. be happy. closing this one.

Loading...