Arch Linux

Please read this before reporting a bug:
https://wiki.archlinux.org/title/Bug_reporting_guidelines

Do NOT report bugs when a package is just outdated, or it is in the AUR. Use the 'flag out of date' link on the package page, or the Mailing List.

REPEAT: Do NOT report bugs for outdated packages!
Tasklist

FS#30219 - {hardware} linux 3.3.8-1 fails to build from ABS (segfaults)

Attached to Project: Arch Linux
Opened by Heiko Baums (cyberpatrol) - Saturday, 09 June 2012, 16:11 GMT
Last edited by Gaetan Bisson (vesath) - Wednesday, 03 October 2012, 14:37 GMT
Task Type Bug Report
Category Packages: Core
Status Closed
Assigned To No-one
Architecture All
Severity Low
Priority Normal
Reported Version
Due in Version Undecided
Due Date Undecided
Percent Complete 100%
Votes 0
Private No

Details

Description:
Linux 3.3.8-1 can't be built from ABS. Compiling always segfaults at random positions. When this happens strange things happen with X at the same time. Therefore it's not possible to compile customized kernels like linux-fbcondecor (https://aur.archlinux.org/packages.php?ID=50924).

I first compiled the updated linux-fbcondecor, then I removed (commented out) the fbcondecor patch, so that it's principally the stock kernel in a single package. After that I tried to compile the stock kernel from ABS twice. Always the same result.

When working on X at the same time while compiling on a text console, either the screen suddenly gets black while the mouse cursor is still there and working, the screen gets totally black and neither mouse nor keyboard respond anymore, just the window of a video player gets black, or I suddenly get logged out of X and get back to the xdm.

I've attached some build logs.

Btw., in the linux build logs you should replace 3.3.8-2 by 3.3.8-1. I've just incremented $pkgrel to 2 to not overwrite the packages downloaded from [core].

The system was updated by pacman -Syu resp. yaourt -Syua immediately before I tried to compile the kernel. So the system was up-to-date.
This task depends upon

Closed by  Gaetan Bisson (vesath)
Wednesday, 03 October 2012, 14:37 GMT
Reason for closing:  None
Additional comments about closing:  There is no linux-3.3.8 in ABS.
Comment by Heiko Baums (cyberpatrol) - Saturday, 09 June 2012, 16:12 GMT
Of course the subject should also say 3.3.8-1 instead of 3.3.8-2.
Comment by Dave Reisner (falconindy) - Saturday, 09 June 2012, 16:23 GMT
Obviously, cannot reproduce. What other foreign packages do you have installed?
Comment by Heiko Baums (cyberpatrol) - Saturday, 09 June 2012, 16:51 GMT
I have a lot of packages installed from AUR (132), but none of them should interfere and had interfered with the kernel compilation before and none of them was running. Even fbsplash is not the reason, because the same happened after booting into linux without fbsplash.

I just tried to rebuild linux-fbcondecor 3.3.7-1 which was no problem when I updated the AUR package to this version. Again the same issue. The only thing that has changed meanwhile was the mainboard of my PC and I think a new gcc version. But as far as I can see every component of the mainboard is detected by udev and the necessary modules are loaded correctly.

Kernel.log shows that regularly X related software is crashing, most of the time xfce4-netload, sometimes kaffeine and sometimes X itself.

Other packages from AUR compile without a problem.
Comment by Dave Reisner (falconindy) - Saturday, 09 June 2012, 16:53 GMT
'pacman -Qm' output is still relevant. Your pacman.log is, too.
Comment by Jan de Groot (JGC) - Saturday, 09 June 2012, 17:14 GMT
As your system becomes completely unstable while compiling, I would suggest to check your system for hardware failure. Either the system runs too hot or you have defective memory.
Comment by Heiko Baums (cyberpatrol) - Saturday, 09 June 2012, 18:59 GMT
A hardware failure can be ruled out. Changing my mainboard solved my hardware issues which didn't include build failures. And the system doesn't run hot. Having the CPU run on full load on both cores shouldn't affect compiling either. That was always the case in the past. Even if the CPU would run up to 70°C compiling the kernel is still possible. The memory is OK, too, as far as I can tell. I'll run memtest86+ anyway.

But I restarted my system, this time without X, and tried to compile the kernel again. Now it worked without any problems. So I guess it's a bug in gcc 4.7.0, xorg or both of them. I'll do some more tests with gcc 4.7.0 and gcc 4.6.3. Btw., haven't I read recently in the mailing list that gcc 4.6 was moved to [extra] because some packages have problems with gcc 4.7.0? Maybe it's true for the kernel, too.

Btw., I won't post the whole 'pacman -Qm' output here. You can believe me, that there is nothing which can cause such a problem. And nothing relevant has changed there.
Comment by Heiko Baums (cyberpatrol) - Saturday, 09 June 2012, 21:10 GMT
Downgraded gcc to 4.6.3, still the same. So it's not gcc 4.7.0. Ran memtest86+ without any errors as expected. So it's pretty unlikely that it is a hardware failure. CPU temperature is, btw., about 40°C.
Comment by Heiko Baums (cyberpatrol) - Tuesday, 12 June 2012, 14:19 GMT
  • Field changed: Percent Complete (100% → 0%)
Do we start this again? Just closing a bug without looking for a reason? For me it does not work. It's not a hardware failure. So there must be a bug somewhere, either in the kernel source (unlikely), the kernel itself (memory management), in the gcc, in xorg or probably in kdelibs.

If it works for someone, doesn't mean that it must work for everyone. A bug can occur only under certain circumstances.
Comment by Allan McRae (Allan) - Tuesday, 12 June 2012, 14:22 GMT
That is a hardware issue. You have gcc ICEs occurring in different places in all four build files. And in multiple logs, the ICE occurs in multiple build threads at the same time.
Comment by Heiko Baums (cyberpatrol) - Tuesday, 12 June 2012, 22:02 GMT
  • Field changed: Percent Complete (100% → 0%)
EDAC doesn't seem to be supported. I get "edac-util: Error: No memory controller data found." or "edac-util: EDAC drivers loaded. No memory controllers found".

Nevertheless, memtest86+ didn't find an error in my memory. If there was a hardware issue, I had a lot of several other issues and instabilities, which was, btw., the case with my old mainboard. This is not the case with my new mainboard anymore. So there must be a software bug somewhere.

That it crashes randomly is not an evidence for a hardware issue either. Because software is run randomly, too. I unfortunately can't downgrade xorg-server anymore, because older versions don't seem to run anymore. The crash can also appear in several randomly appearing constellations.

I know that this is hard to diagnose, and I don't know how to do it exactly in this case, but I also know that this is definitely not a hardware issue, because I don't have any other instabilities or hardware failures. I only have this problem when compiling the kernel and working with X at the same time, which worked before when the kernel was upgraded from 3.3.6-1 to 3.3.7-1.

Compiling just the kernel without having X running works without any problems or instabilities. This is evidence for not being a hardware issue, too.
Comment by Jan de Groot (JGC) - Tuesday, 12 June 2012, 22:03 GMT
So, what is your hardware configuration anyways? I'm still convinced this is a hardware problem. I've seen your "bug" before on an old Athlon XP system that was equipped with 3 DDR memory modules, memtest would not find any memory error, but a kernel compile would never finish without ICEs.
Comment by Heiko Baums (cyberpatrol) - Tuesday, 12 June 2012, 22:58 GMT
My system:

CPU: AMD Athlon64 X2 6000+
Old Mainboard: Gigabyte GA-MA770-DS3
New Mainboard: ASRock N68C-S UCC (chipset: NVIDIA MCP61)
RAM: 4 GB (2x 2 GB) DDR2 Corsair
Graphics card: ATI Radeon HD 3450
Audio card: M-Audio Audiophile 24/96
DVB-T card: Terratec Cinergy 1400 DVB-T

The CPU can't be the reason, because it worked with the same CPU before.
The BIOS settings are the default resp. the energy saving defaults with some adjustments like deactivated floppy and IDE device.
Comment by Jan de Groot (JGC) - Tuesday, 12 June 2012, 23:03 GMT
CPUs can break. The memory controller is integrated in your CPU, so if that one is broken, replacing mainboards will not help much. Same for power supplies. You're also not very specific about the exact memory specifications. Did you test with only one stick of memory?
Comment by Heiko Baums (cyberpatrol) - Tuesday, 12 June 2012, 23:24 GMT
If the CPU breaks you have several other issues than only a crash during a kernel compilation while X is running. A broken CPU is pretty unlikely. The same for a defective memory.

The issues I had with the old mainboard were:
1. During early boot stage (initrd) the USB stick wasn't always recognized anymore.
2. When running X, the keyboard and mouse suddenly didn't respond anymore.
3. When running X the screen got blank, sometimes except for the mouse cursor.
4. When typing into a web form in Firefox the monitor suddenly switched into the standby mode.

I first changed the power supply. Didn't change anything.
Then I changed the mainboard. Fixed all of these issues.

Which other memory specifications do you mean? Of course, I tested the memory with both sticks.

Loading...