FS#19733 - Update to glibc 2.12-2 on VIA C3 Nehemia makes system unusable

Attached to Project: Arch Linux
Opened by Manfred Miederer (LessWire) - Monday, 07 June 2010, 01:41 GMT
Last edited by Allan McRae (Allan) - Monday, 25 October 2010, 01:01 GMT
Task Type Bug Report
Category Packages: Core
Status Closed
Assigned To Allan McRae (Allan)
Architecture i686
Severity Critical
Priority Normal
Reported Version
Due in Version Undecided
Due Date Undecided
Percent Complete 100%
Votes 8
Private No

Details

CPU: VIA C3 Nehemia and latest glibc

I use archlinux with this system since over a year now WITHOUT any problems - until yesterday.

What has happened:

upgrading binutils 2.20.1-2 => 2.20.1-3
upgrading glibc 2.11.1-3 => 2.12-2

After pacman has installed the new glibc, i get an "invalid opcode" and the pacman script breaks.

From this time no further command which uses glibc can be run, i always get "invalid opcode".
Rebooting the machine crashes the kernel immediately after mounting the "real" rootfs.

This looks like this new version of glibc was not compiled for CPU = "generic" ?

Downgrading to the old version of glibc (= 2.11.1-3) all works fine.

Steps to reproduce:

Simply do this update on a machine with this CPU.
This task depends upon

Closed by  Allan McRae (Allan)
Monday, 25 October 2010, 01:01 GMT
Reason for closing:  Fixed
Additional comments about closing:  As much as it is going to be...
Comment by Thomas Dziedzic (tomd123) - Monday, 07 June 2010, 02:05 GMT
Dunno if this is true but...

http://forum.openvz.org/index.php?t=msg&goto=3497& - "VIA C3 isn't pure i686 processor. AFAIK it's i386."

looks like your processor is i386 not i686.

More reads: http://blueneon.xidus.net/bn/2005/06/05/gentoo-on-the-via-c3/
Comment by Manfred Miederer (LessWire) - Monday, 07 June 2010, 03:13 GMT
Tom:

This is a c3 "nehemiah" which is like a 686 for sure.

I run it since several years now at first using Debian and since over a year i use archlinux. I never had any issues, i did a lot of updates and everything worked fine! Remember that the kernel itself is the most critical thing and yes, it runs perfectly (currently 2.6.34).

you are right, there are c3 processors which are not compatible, but this one is (nehemiah)!
Comment by Allan McRae (Allan) - Monday, 07 June 2010, 03:31 GMT
This was definitely compiled with our default CFLAGS. Can you try using 2.12-1?
Comment by Manfred Miederer (LessWire) - Monday, 07 June 2010, 03:31 GMT
By the way, read the article you recommended with the link to "blueneon":

"Update, to disambiguate things: The Nehemiah core and later fully support the CMOV instruction mentioned below! As far as I know, you can use i686 as the CHOST for those."

That's true, I agree. ;-)
Comment by Manfred Miederer (LessWire) - Monday, 07 June 2010, 03:33 GMT
Thanks Allan,

where can i get 2.12-1 ? (didn't have it in the core repo).
Comment by Allan McRae (Allan) - Monday, 07 June 2010, 03:48 GMT Comment by Manfred Miederer (LessWire) - Monday, 07 June 2010, 04:08 GMT
Many thanks, i will try it tomorrow (pretty late here now).
Comment by Manfred Miederer (LessWire) - Tuesday, 08 June 2010, 02:02 GMT
2.12-1 works !

If the use of always same CFLAGS is guaranteed and nevertheless an "invalid opcode" is thrown, i would presume a deficient working compiler for this cpu type. I saw the latest version of gcc coming in concurrently with glibc.
Comment by Allan McRae (Allan) - Tuesday, 08 June 2010, 02:27 GMT
2.12-1 and 2.12-2 were built with the same compiler. The only difference is that I added "--disable-multi-arch" in 2.12-2. This should not cause breakages...

Comment by Manfred Miederer (LessWire) - Tuesday, 08 June 2010, 12:32 GMT
My son should do this test with 2.12-1, but he overlooked my lock of the running version so the newer was not installed. Sorry for this inconvenience, 2.12-1 is also unusable.

I think compiling the new lib with an older gcc version could be the solution. As I don't want to steal you more time I could try that for myself but this via system has not enough space to install all the tools necessary.

Comment by Allan McRae (Allan) - Wednesday, 09 June 2010, 04:15 GMT
Can you test http://dev.archlinux.org/~allan/glibc-2.12-2.1-i686.pkg.tar.xz ? That has the only patch that even seems mildly related from glibc git. I can find no mention of this on either the glibc or gcc bugtrackers.
Comment by Manfred Miederer (LessWire) - Friday, 11 June 2010, 01:59 GMT
... seems like i'm the only one running archlinux on this kind of cpu.

Thanks Allan, i downloaded it and i will look to try it this weekend. This 24/7 system is standalone without kbd/screen, BIOS doesn't like boot from usb and it will cost time if it fails and must be downgraded within the "busybox shell" again :(
Comment by Mark Pustjens (unknown) - Friday, 11 June 2010, 23:04 GMT
I can confirm this bug.
After a recent update (20101512, pacman -Syu) i get `Illegal instruction' with every command.

This is on a system with the same cpu as OP.

To confirm, i did a clean core install (archlinux-2010.05-core-i686.iso). This install worked fine.
After updating linux-api-headers i installed the glibc-2.12-2.1 package attached to this bug.
I again got a `Illegal instruction' with every command.

Dowgrading to glibc-2.11.1-3 (from the iso) fixed my system.
Comment by Manfred Miederer (LessWire) - Saturday, 12 June 2010, 00:22 GMT
Many thanks Mark, i'm happy that i'm not alone :-)
No need to test 2.12-2.1 for myself.
Comment by Allan McRae (Allan) - Saturday, 12 June 2010, 04:16 GMT
I can find no bug about this upstream so first we need to confirm that glibc is the issue and not the fact it is built with gcc-4.5. Then we will need to bisect it down to a specific commit before upstream will consider it a bug...

That will take about 10 glibc builds to track down. I do not have that particular hardware, so I can provide a package for you to test and tell me if it is working then I will provide another package.... This will take a few weeks!

Of course, if either of you can do the bisect yourself, this will be much faster? If you can not, I will start uploading more test packages and waiting on your reports.
Comment by Allan McRae (Allan) - Saturday, 12 June 2010, 11:33 GMT
OK, lets start tracking this down:
http://dev.archlinux.org/~allan/glibc-2.12-2.2-i686.pkg.tar.xz
(glibc-24c0bf7a)
Comment by Manfred Miederer (LessWire) - Monday, 14 June 2010, 03:58 GMT
Allan, that's nice but as i did mention before the hassle for me is the downgrade after a test fails, because this system doesn't have kbd/screen/cdrom, it can't boot from usb and has some other untoward circumstances.
But probably a simple solution would be, if i could get a statical linked version with glibc <=2.11.1 of "rsync". Or is there a way, to prepare another libpath for rsync before the test?
Comment by Allan McRae (Allan) - Monday, 14 June 2010, 06:49 GMT
I guess you could probably just put the old /lib/libc.so.6 in a folder and use something like "LD_LIBRARY_PATH=/path/to/old/lib rsync" to do your rsync.
Comment by Manfred Miederer (LessWire) - Monday, 14 June 2010, 18:34 GMT
thanks, your recommendation for setting another libpath works but was not needed:

2.12-2.2 runs !!!

I used "pacman -U ..." and i had to delete "/etc/ld.so.cache" first to get it installed.
Comment by Manfred Miederer (LessWire) - Monday, 14 June 2010, 18:44 GMT
In addition i installed "binutils-2.20.1-3" which depend on glibc>=2.12-1. It works.
Comment by Allan McRae (Allan) - Tuesday, 15 June 2010, 05:14 GMT
That confirms that it is not a gcc-4.5 issue but rather a glibc issue.

Next:
http://dev.archlinux.org/~allan/glibc-2.12-2.3-i686.pkg.tar.xz
(glibc-c60bce2c)
Comment by webnull (webnull) - Tuesday, 15 June 2010, 20:26 GMT Comment by Manfred Miederer (LessWire) - Tuesday, 15 June 2010, 20:50 GMT
What code should someone write in C to get valid machinecode on the one and invalid on the other hand ?
I guess the code is not too cpu specific and i think the compiler (gcc 4.5) stays under suspicion furthermore ;-)


Yes, 2.12-2.3 runs !
Comment by Allan McRae (Allan) - Tuesday, 15 June 2010, 23:39 GMT
Nah... this still a glibc bug. webnulls bug is something stupid on his system...

Comment by Manfred Miederer (LessWire) - Wednesday, 16 June 2010, 00:36 GMT
again:
2.12-2.3 works (i did edit last comment. so maybe this was overlooked)
Comment by Allan McRae (Allan) - Wednesday, 16 June 2010, 04:05 GMT Comment by Manfred Miederer (LessWire) - Wednesday, 16 June 2010, 13:30 GMT

2.12-2.4 doesn't work, it's the same behaviour as the upgrade in origin.

Arrrgh! I can set LD_LIBRARY_PATH, but rsync doesn't follow --> invalid opcode :(
Comment by Allan McRae (Allan) - Wednesday, 16 June 2010, 14:20 GMT
Hmm... I was sure that the LD_LIBRARY_PATH would work... The "good news" is that each test from now on is 50/50 at working and there are probably going to be fewer steps to track the breakage given there is a lot of sparc specific changes in the time period that the bug has been located to.

Next:
http://dev.archlinux.org/~allan/glibc-2.12-2.5-i686.pkg.tar.xz
(glibc-2fe000df)

Comment by Manfred Miederer (LessWire) - Wednesday, 16 June 2010, 16:07 GMT
LD_LIBRARY_PATH works normally fine, but no chance there. I also changed all standard paths without success. The only working cmd is "cd".

50/50? I have to look to make things easier so i can work in a ssh shell only. I think a chroot environment could be fine. what do you think?

Back to the topic:
Just for a try i restored only libc-2.11.90.so (from 2.12-2.3), but that doesn't help.

Restored all from 2.12-2.3 now and installed 2.12-2.5 and it doesn't work.
Comment by Allan McRae (Allan) - Wednesday, 16 June 2010, 23:51 GMT
Ah... if restoring only lib-2.11.90 does not work, then you are going to need more libraries in the folder pointed to with LD_LIBRARY_PATH. I think a chroot should expose the issue. You have a working and non-working glibc, so there is only one way to find out...
Comment by Allan McRae (Allan) - Thursday, 17 June 2010, 00:21 GMT
Next:
http://dev.archlinux.org/~allan/glibc-2.12-2.6-i686.pkg.tar.xz
(glibc-741895aa)

I think I see the relevant change... if I am correct, then this package should work.
Comment by Manfred Miederer (LessWire) - Thursday, 17 June 2010, 00:46 GMT

... and you are correct - 2.12-2.6 works.

Can you say it with a few words: what's the cause ?
Comment by Allan McRae (Allan) - Thursday, 17 June 2010, 00:56 GMT
I think it is this commit:
http://sourceware.org/git/?p=glibc.git;a=commit;h=01f1f5ee

However, that is one on 20 possible commits so it is still not confirmed. I will rebuild the current glibc with just that commit reversed now to confirm this is the issue.
Comment by Manfred Miederer (LessWire) - Thursday, 17 June 2010, 01:25 GMT
ok, keep them coming ;-)
i have a chroot environment now, each test (and restore of old libs) is finished after a few minutes.
Comment by Allan McRae (Allan) - Thursday, 17 June 2010, 01:40 GMT
http://dev.archlinux.org/~allan/glibc-2.12-2.7-i686.pkg.tar.xz

Just that one commit reversed... lets see how good a t spot the breakage I am!
Comment by Manfred Miederer (LessWire) - Thursday, 17 June 2010, 01:53 GMT
very good spotted, this works also! but hey, didn't i always say the compiler is the issue ? And didn't i ask for the "generic" switch ? ;-)
Comment by Allan McRae (Allan) - Thursday, 17 June 2010, 03:28 GMT
Well... I was using the generic switch... glibc was just ignoring me.

So now I just have to figure how to properly fix this.
Comment by Manfred Miederer (LessWire) - Thursday, 17 June 2010, 04:06 GMT
It's pretty much effort for this rarely used cpu. Thank you for wasting your time :-)
Comment by Allan McRae (Allan) - Saturday, 19 June 2010, 14:33 GMT
It looks like this is the same issue as reported in Fedora for another couple of "i686" CPUs: https://bugzilla.redhat.com/show_bug.cgi?id=579838
Comment by Manfred Miederer (LessWire) - Sunday, 20 June 2010, 00:29 GMT
It is - definitely !
Reading the Fedora discussion, the cause is a "nopl" instruction, which never has been described in Intel specs. That's sloppy or they did it with intention ? :(
I compiled some kernels on this machine and i know that it must have "-march/mtune=native" or "generic" and not "i686". That's why i have asked for the "generic" switch so permanently.
It looks like there are a lot of Fedora users using a Geode cpu, so there should be a solution soon?
Comment by Allan McRae (Allan) - Sunday, 20 June 2010, 05:42 GMT
This is going to be marked as a "Won't Fix". glibc works correctly on an i686 CPU, using the definition of i686 used by gcc.

Your choices are:
1) build glibc with the causal commit reverted (see link above).
2) build a kernel that emulates NOPL instructions (see http://bbs.archlinux.org/viewtopic.php?pid=775414)

I recommend the custom kernel, as that is likely to be how this is fixed in the long term (although not with that patch exactly) and then you do not run the risk of other software crashing because that instruction gets included. I guess a patch to fix this will eventually work its way into the kernel mainline and thus Arch.
Comment by Markus Golser (elmargol) - Saturday, 11 September 2010, 08:48 GMT
Not all i686 class CPUs do actually support the NOPL instruction. (Intel did not specifiy this instruction in the documentation for i686)
CPUs affected include Via C3, Via Eden, AMD Geode LX (as used in OLPC), Transmeta Crusoe and Virtual PC.

Affected package: binutils

Related Bugreports:
http://www.sourceware.org/bugzilla/show_bug.cgi?id=6957
https://bugzilla.redhat.com/show_bug.cgi?id=579838

The binutils beta contains a fix for this
http://gcc.gnu.org/ml/gcc/2010-08/msg00194.html
Comment by Gabriel Francisco Santos (gabriel_frc) - Tuesday, 14 September 2010, 18:34 GMT
I have te same problems with this libs that cause segmentation faults in Xterm, Zsh, Ftp, Amarok, Passwd and others. Downgrading to the old version of glibc (= 2.11.1-3) and binutils (= 2.20.1-2 ) all works fine.
My CPU is AMD Phenom X4 9850 and running archlinux x86_64.
Comment by Manfred Miederer (LessWire) - Tuesday, 14 September 2010, 23:17 GMT
Gabriel, your Phenom CPU really has a fully compatible instruction set and for sure it understands a NOPL instruction.

Your issue with glibc etc. has another reason not belonging to this thread!

By the way and as Allan wrote above, i always build my kernel for Via Nehemiah with NOPL emulation and everything works fine.
Comment by Markus Golser (elmargol) - Wednesday, 22 September 2010, 09:53 GMT
What do you mean by: "2010-09-14: A task closure has been requested. Reason for request: Solved (so far no better solution possible)"

There is no solution at all. There must be a better solution than no solution
Comment by Tad Fisher (tad) - Wednesday, 29 September 2010, 10:26 GMT
This was a bug in binutils. GAS was generating NOPL for i686-generic, which is wrong, since NOPL is *not* a documented part of the i686 specification. This has since been fixed in binutils 2.20.51.0.11.

The "right" solution:
- update to binutils >= 2.20.51.0.11
- rebuild glibc and other affected packages.
Comment by Allan McRae (Allan) - Wednesday, 29 September 2010, 10:38 GMT
We are not upgrading to a prerelease binutils. Next time I do a toolchain rebuild I will pull the relevant patches on binutils-2.20. That probably will not happen for a couple of weeks.
Comment by Allan McRae (Allan) - Wednesday, 06 October 2010, 23:29 GMT
Workaround patch added to glibc-2.12.1-2 in [testing]
Comment by markus (markuman) - Friday, 22 October 2010, 09:17 GMT
booting and running works fine with this workaround fix.
but then the system boots as i586 and you can't install/update packages from repo anymore.
Comment by Allan McRae (Allan) - Friday, 22 October 2010, 09:18 GMT
Ummm... what? I need more explanation here but if your system thinks it is an i586 that is kernel related and not glibc related.
Comment by Paweł D. (totalizator) - Friday, 22 October 2010, 09:26 GMT
I've recently upgraded my system and installed glibc-2.12.1-2 and finally everything works. No brick. No problems with packages install/update. Thanks!
Comment by Markus Golser (elmargol) - Friday, 22 October 2010, 09:30 GMT
@Pawel what CPU do you have?
Comment by Paweł D. (totalizator) - Friday, 22 October 2010, 09:41 GMT
@Markus: VIA Nehemiah http://pastebin.com/gYs5UPJH
Comment by markus (markuman) - Friday, 22 October 2010, 12:48 GMT
If i boot default core kernel with the new glibc, arch boot as i586. if i boot with my custom kernel with the nopl patch, arch boot as i686. CPU is Geode LX 800.
Comment by Allan McRae (Allan) - Friday, 22 October 2010, 12:52 GMT
what do you mean it boots as i586? uname says i586?
Comment by markus (markuman) - Friday, 22 October 2010, 14:27 GMT
yes, uname -a said i586
Comment by Markus Golser (elmargol) - Friday, 22 October 2010, 15:15 GMT
Thank you everything works now on my system again:


cat /proc/cpuinfo
processor : 0
vendor_id : CentaurHauls
cpu family : 6
model : 9
model name : VIA Nehemiah
stepping : 8
cpu MHz : 666.549
cache size : 64 KB
fdiv_bug : no
hlt_bug : no
f00f_bug : no
coma_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 1
wp : yes
flags : fpu vme de pse tsc msr cx8 sep mtrr pge cmov pat mmx fxsr sse up rng rng_en ace ace_en
bogomips : 1333.64
clflush size : 32
cache_alignment : 32
address sizes : 32 bits physical, 32 bits virtual
power management:
Comment by Allan McRae (Allan) - Friday, 22 October 2010, 22:43 GMT
@markus: please post the entire uname -a.
Comment by Markus Golser (elmargol) - Saturday, 23 October 2010, 05:59 GMT
Linux nas 2.6.35-ARCH #1 SMP PREEMPT Wed Sep 29 07:17:20 UTC 2010 i686 VIA Nehemiah CentaurHauls GNU/Linux
Comment by Allan McRae (Allan) - Saturday, 23 October 2010, 06:46 GMT
Sorry, I meant the other Markus (markuman).
Comment by markus (markuman) - Saturday, 23 October 2010, 10:32 GMT
@Allan
here are some outputs from core kernel and my own nopl emu kernel.
http://files.osuv.de/geode/outputs/
Comment by Allan McRae (Allan) - Saturday, 23 October 2010, 11:34 GMT
For the moment, get rid of Architecture=auto in your pacman.conf and you will be able to install packages again.

As far as I can tell, this is due to the kernel not believing your system is i686. Not much that can be done about that from a toolchain perspective... maybe a kernel update will change that or when the proper fix gets released in binutils (late November). I suggest you keep using a custom kernel until that point.

Comment by markus (markuman) - Sunday, 24 October 2010, 19:54 GMT
alright. i guess you can close this bug again :D
because it was just opened for the via nehemia.
geode users have to use nopl patch and wait for binutils major update or as you said, ignore system architecture.

Loading...