FS#14720 - GHC 6.10.3 Illegal Instruction errors with new glibc

Attached to Project: Arch Linux
Opened by Alexander Dunlap (ajd) - Friday, 15 May 2009, 02:01 GMT
Last edited by Allan McRae (Allan) - Sunday, 17 May 2009, 05:30 GMT
Task Type Bug Report
Category Packages: Extra
Status Closed
Assigned To Vesa Kaihlavirta (vegai)
Allan McRae (Allan)
Architecture i686
Severity High
Priority Normal
Reported Version
Due in Version Undecided
Due Date Undecided
Percent Complete 100%
Votes 0
Private No

Details

Description:

I am using GHC 6.10.3 on i686. When I upgraded to the new glibc (2.10.1), GHC stopped working, reporting errors of "Illegal Instruction" (presumably from the kernel). Reinstalling GHC didn't fix the problem. I don't think this is an upstream problem; other Haskell users are not encountering the bug.

Steps to reproduce:
type 'ghci' at a bash prompt. I got the following output:

$ ghci
GHCi, version 6.10.3: http://www.haskell.org/ghc/ :? for help
Loading package ghc-prim ... linking ... done.
Loading package integer ... linking ... done.
Loading package base ... linking ... done.
Illegal instruction

The bug also occurs when running GHC on a file (running ghc --version doesn't trigger it, but actually running it on a file does trigger it). The interesting thing is that it seems like the Illegal Instruction message only occurs after a certain amount of compilation has taken place. For example:

$ ghc test.hs

test.hs:7:0:
Illegal instance declaration for `Num (a -> a)'
(All instance types must be of the form (T a1 ... an)
where a1 ... an are type *variables*,
and each type variable appears at most once in the instance head.
Use -XFlexibleInstances if you want to disable this.)
In the instance declaration for `Num (a -> a)'
$ vi test.hs # Fix the problem
$ ghc test.hs

test.hs:1:0: The function `main' is not defined in module `Main'
$ vi test.hs # Fix that problem; test.hs is now error-free
$ ghc test.hs
Illegal instruction

Also, binaries compiled by previous (now-uninstalled) GHC versions report "Illegal Instruction" when run. I have the following session:

$ ldd Tests
linux-gate.so.1 => (0xb7fba000)
libuuid.so.1 => /lib/libuuid.so.1 (0xb7fa3000)
librt.so.1 => /lib/librt.so.1 (0xb7f9a000)
libutil.so.1 => /lib/libutil.so.1 (0xb7f96000)
libdl.so.2 => /lib/libdl.so.2 (0xb7f92000)
libm.so.6 => /lib/libm.so.6 (0xb7f6c000)
libgmp.so.3 => /usr/lib/libgmp.so.3 (0xb7f21000)
libc.so.6 => /lib/libc.so.6 (0xb7dd7000)
/lib/ld-linux.so.2 (0xb7fbb000)
libpthread.so.0 => /lib/libpthread.so.0 (0xb7dbf000)
$ ./Tests
rvwords: Illegal instruction

where rvwords was the first part of the normal output of "Tests".

Thanks for looking into this.
This task depends upon

Closed by  Allan McRae (Allan)
Sunday, 17 May 2009, 05:30 GMT
Reason for closing:  Fixed
Additional comments about closing:  gmp-4.3.1-2
Comment by Alexander Dunlap (ajd) - Friday, 15 May 2009, 02:06 GMT
Just as another note, I have gmp 4.3.0 which I have heard is buggy; Haskell also depends heavily on GMP. This might be relevant.
Comment by Gerardo Exequiel Pozzi (djgera) - Friday, 15 May 2009, 07:59 GMT
can run under gdb, an do a backtrace?

What is your CPU? and flags?

I can not reproduce it. (the only difference in my system is, that I use a custom kernel)
[djgera@gerardo ~]$ uname -m
i686
[djgera@gerardo ~]$ cat /proc/version
Linux version 2.6.29.3 (root@gerardo) (gcc version 4.4.0 20090505 (prerelease) (GCC) ) #1 SMP PREEMPT Fri May 15 04:43:18 ART 2009
[djgera@gerardo ~]$ egrep -m2 "model name|flags" /proc/cpuinfo
model name : AMD Athlon(tm) 64 X2 Dual Core Processor 5200+
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt rdtscp lm 3dnowext 3dnow pni cx16 lahf_lm cmp_legacy svm extapic cr8_legacy
[djgera@gerardo ~]$ ghci
GHCi, version 6.10.3: http://www.haskell.org/ghc/ :? for help
Loading package ghc-prim ... linking ... done.
Loading package integer ... linking ... done.
Loading package base ... linking ... done.
Prelude>

Comment by Jan de Groot (JGC) - Friday, 15 May 2009, 08:31 GMT
This is related to gmp. Updating to 4.3.1 or downgrading will fix this bug. Gcc has the same problem on i686.
Comment by Gerardo Exequiel Pozzi (djgera) - Friday, 15 May 2009, 08:36 GMT
GMP, ok, on some systems, maybe on Intel? AMD isn't affected.

Also runs fine with all Arch Linux packages under VirtualBox. (all latest packages from testing)

[djgera@arch32 ~]$ pacman -Qi gmp gcc glibc ghc binutils kernel26 | egrep "Name|Version"
Name : gmp
Version : 4.3.0-1
Name : gcc
Version : 4.4.0-2
Name : glibc
Version : 2.10.1-1
Name : ghc
Version : 6.10.3-1
Name : binutils
Version : 2.19.1-3
Name : kernel26
Version : 2.6.29.3-1
[djgera@arch32 ~]$ ghci
GHCi, version 6.10.3: http://www.haskell.org/ghc/ :? for help
Loading package ghc-prim ... linking ... done.
Loading package integer ... linking ... done.
Loading package base ... linking ... done.
Prelude>
Comment by Allan McRae (Allan) - Friday, 15 May 2009, 13:48 GMT
gmp-4.3.1 is now pushed to [testing]. Hopefully that will fix this... I didn't realize quite how broken gmp-4.3.0 was.
Comment by Alexander Dunlap (ajd) - Saturday, 16 May 2009, 15:49 GMT
I have upgraded to gmp-4.3.1; unfortunately, the bug persists.

alex@chillynight:~
$ uname -m
i686
alex@chillynight:~
$ cat /proc/version
Linux version 2.6.29-ARCH (root@T-POWA-LX) (gcc version 4.4.0 (GCC) ) #1 SMP PRE
EMPT Sat May 9 12:47:43 UTC 2009
alex@chillynight:~
$ egrep -m2 "model name|flags" /proc/cpuinfo
model name : Pentium III (Coppermine)
flags : fpu vme de pse tsc msr pae mce cx8 sep mtrr pge mca cmov pat p
se36 mmx fxsr sse up
alex@chillynight:~
$ ghci
GHCi, version 6.10.3: http://www.haskell.org/ghc/ :? for help
Loading package ghc-prim ... linking ... done.
Loading package integer ... linking ... done.
Loading package base ... linking ... done.
Illegal instruction
alex@chillynight:~
$ pacman -Qi gmp gcc glibc ghc binutils kernel26 | egrep "Name|Version"
Name : gmp
Version : 4.3.1-1
Name : gcc
Version : 4.4.0-2
Name : glibc
Version : 2.10.1-1
Name : ghc
Version : 6.10.3-1
Name : binutils
Version : 2.19.1-3
Name : kernel26
Version : 2.6.29.3-1
alex@chillynight:~
$ gdb --args '/usr/lib/ghc-6.10.3/ghc' --interactive -B/usr/lib/ghc-6.10.3/. -dy
nload wrapped
GNU gdb 6.8
Copyright (C) 2008 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "i686-pc-linux-gnu"...
(no debugging symbols found)
(gdb) run
Starting program: /usr/lib/ghc-6.10.3/ghc --interactive -B/usr/lib/ghc-6.10.3/.
-dynload wrapped
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
[Thread debugging using libthread_db enabled]
(no debugging symbols found)
[New Thread 0xb7e7d8d0 (LWP 16493)]
[New Thread 0xb7cffb70 (LWP 16496)]
[New Thread 0xb74feb70 (LWP 16497)]
GHCi, version 6.10.3: http://www.haskell.org/ghc/ :? for help
Loading package ghc-prim ... linking ... done.
Loading package integer ... linking ... done.
Loading package base ... linking ... done.

Program received signal SIGILL, Illegal instruction.
[Switching to Thread 0xb7e7d8d0 (LWP 16493)]
0xb80031c2 in __gmpn_mul_1 () from /usr/lib/libgmp.so.3
(gdb) backtrace
#0 0xb80031c2 in __gmpn_mul_1 () from /usr/lib/libgmp.so.3
#1 0xb7ff9ade in __gmpz_mul () from /usr/lib/libgmp.so.3
#2 0x08bf8b44 in ?? ()
#3 0x08d84360 in ?? ()
#4 0x08d84348 in ?? ()
#5 0x08d84354 in ?? ()
#6 0x00000000 in ?? ()
(gdb)

I'm not experienced with GDB, but this seems to indicate that the problem is still in GMP.
Comment by Alexander Dunlap (ajd) - Saturday, 16 May 2009, 15:56 GMT
If I downgrade gmp to 4.2.4 from core, then everything works fine. So it's only 4.3.0 and 4.3.1 versions of GMP that are broken.
Comment by Allan McRae (Allan) - Saturday, 16 May 2009, 16:05 GMT
As I can replicate, it is very difficult for me to find a fix... Can you please try:
1) Downgrading gmp to the version in [core] (4.2.x).
2) Rebuilding ghc with gmp-4.3.1
are report success/failure?
Comment by Allan McRae (Allan) - Saturday, 16 May 2009, 16:07 GMT
Well, I guess since there was a comment made while I was looking into this, rebuilding ghc is all that needs tested...
Comment by Alexander Dunlap (ajd) - Saturday, 16 May 2009, 16:19 GMT
Do you mean rebuilding Arch's copy of GHC (if so, could you point me to instructions for doing that?) or rebuilding a vanilla source tarball from GHC HQ?
Comment by Allan McRae (Allan) - Saturday, 16 May 2009, 16:26 GMT
I meant rebuild Arch's copy. Look at ABS and makepkg in the wiki for instructions.
Comment by Alexander Dunlap (ajd) - Saturday, 16 May 2009, 16:29 GMT
Okay, I did that, but now I have a problem. In order to build GHC with GMP 4.3.1, I need to upgrade GMP to 4.3.1. But GHC is self-hosting, and when I upgrade GMP to 4.3.1, my existing GHC does not work to compile the new one (!). (I actually tried this, and it didn't work, crashing with the "Illegal Instruction" message.) Is there a way to install a private copy of GMP? (I guess I'll try building that too...)
Comment by Celti Burroughs (Celti) - Saturday, 16 May 2009, 16:34 GMT
Building ghc with gmp 4.3.1 fails.

[1 of 1] Compiling Main ( ifBuildable.hs, ifBuildable.o )
/bin/sh: line 1: 25054 Illegal instruction /usr/bin/ghc -Wall --make ifBuildable -o ifBuildable
make[1]: *** [ifBuildable/ifBuildable] Error 132
make[1]: Leaving directory `/var/abs/extra/ghc/src/ghc-6.10.3/libraries'
make: *** [stage1] Error 2

I'm running an entirely up-to-date testing system - -Syu'ed before building.
Comment by Allan McRae (Allan) - Sunday, 17 May 2009, 00:26 GMT
Vesa - can you replicate?
Comment by Gerardo Exequiel Pozzi (djgera) - Sunday, 17 May 2009, 02:50 GMT
@Alex: aja! you don't support SSE2 and the code execute an instruccions that are SSE2

if you run gdb like above, and when received SIGILL just remember the address and type "disas __gmpn_mul_1" in that address what is the instruction that appears? (please do not paste all function, only the instruction at the address that fails.)

Can see some instructions that are SSE2 like pmuludq, paddq, and others... (from gmp)

Maybe a recompilation of the gmp in your machine that dont support SSE2 will be compile OK ;)
Comment by Allan McRae (Allan) - Sunday, 17 May 2009, 03:09 GMT
@djgera: that seems to confirm the suspicion I had when I saw ajd was using a Pentium III. So I guess I need to fix the build... Looking into it now.
Comment by Alexander Dunlap (ajd) - Sunday, 17 May 2009, 03:27 GMT
$ gdb --args '/usr/lib/ghc-6.10.3/ghc' --interactive -B/usr/lib/ghc-6.10.3/. -dynload wrapped

... SNIP ...

Program received signal SIGILL, Illegal instruction.
[Switching to Thread 0xb7d938d0 (LWP 8786)]
0xb7f191c2 in __gmpn_mul_1 () from /usr/lib/libgmp.so.3
(gdb) disas __gmpn_mul_1

... SNIP ...

0xb7f191c2 <__gmpn_mul_1+34>: pmuludq %mm7,%mm0

... SNIP ...

So yes, it appears to be pmuludq.

I also downloaded the gmp-4.3.1 PKGBUILD from svn, built it, and installed it. GHC now works fine. The problem only exists in the gmp that I installed from testing.

Again, thank you very much to everyone for looking into this.
Comment by Gerardo Exequiel Pozzi (djgera) - Sunday, 17 May 2009, 03:32 GMT
yes, i just rebuilded the gmp pkg in my machine, and no sse2 instruccions are in the code.

At build time i can see (relevant):

checking build system type... athlon64-pc-linux-gnu
checking host system type... athlon64-pc-linux-gnu

but next...
MPN_PATH=" x86/k7/mmx x86/k7 x86 generic"
And don't use paths that have sse2 like "x86/pentium4/sse2" and "x86/p6/sse2", the problem is that generate code for amd :s

So force a MPN_PATH in PKGBUILD to use proper code for i686 ;)
Comment by Allan McRae (Allan) - Sunday, 17 May 2009, 03:34 GMT
It seems setting CFLAGS to "-march=i686 -mtune=generic" is not enough for gmp. It needs to have "./configure --build=$CARCH --host=$CARCH".

@ajd: I will upload a "fixed" version of gmp somewhere in a few minutes and provide a link. It would be great if you could test it.
Comment by Allan McRae (Allan) - Sunday, 17 May 2009, 03:43 GMT
In fact, adding --build=$CARCH makes MPN_PATH=" x86/p6 x86 generic" on i686 which is exactly what we want. So a new version of gmp on its way to [testing] now.
Comment by Alexander Dunlap (ajd) - Sunday, 17 May 2009, 05:29 GMT
I just ran pacman -Syu, downloaded gmp-4.3.1-2, now GHC works fine. It looks like this one is squashed.

Loading...