FS#46508 - [flightgear] crash at startup ( illegal instruction )

Attached to Project: Community Packages
Opened by patrick (potomac) - Thursday, 01 October 2015, 13:01 GMT
Last edited by Evangelos Foutras (foutrelis) - Wednesday, 14 October 2015, 08:23 GMT
Task Type Bug Report
Category Packages
Status Closed
Assigned To Sergej Pupykin (sergej)
Evangelos Foutras (foutrelis)
Architecture All
Severity High
Priority Normal
Reported Version
Due in Version Undecided
Due Date Undecided
Percent Complete 100%
Votes 1
Private No

Details

Description:

since a recent "pacman -Syu" flightgear doesn't work, it crash few seconds after the display of the splash screen,

the error message in the console is :

Illegal instruction (core dumped)

flightgear worked 2 mounths ago, but since a recent upgrade of some packages ( gcc 5, mesa 11.x, kernel 4.2 ) it doesn't work, I don't know exactly which package is the culprit,

maybe a rebuild of flightgear and simgear can help,

I have a radeon HD4650 Pci with open source driver ( radeon ), maybe the bug doesn't occur with a nvidia graphic card



Additional info:
* package version(s) flightgear 3.4.0-2, mesa 11.0.2-1, linux 4.2.1-1, xorg-server 1.17.2-4, gcc-multilib 5.2.0-2

* config and/or log files etc.


Steps to reproduce:
- install flightgear
- update your system
- start flightgear, it will crash few seconds later during the splash boot screen
This task depends upon

Closed by  Evangelos Foutras (foutrelis)
Wednesday, 14 October 2015, 08:23 GMT
Reason for closing:  Duplicate
Additional comments about closing:   FS#46706 
Comment by Doug Newgard (Scimmia) - Thursday, 01 October 2015, 15:22 GMT
i686 system?
Comment by patrick (potomac) - Thursday, 01 October 2015, 16:18 GMT
no, I use archlinux 64 bits,

I tried to rebuild flightgear package but it doesn't solve the problem,

I tried also to downgrade kernel but it doesn't help, I think the culprit could be mesa 11,

for an unknown reason it's impossible to downgrade to mesa 10.6.7 ( the previous version ), if I do this then all openGL software will have a crash,

I tried also to rebuild mesa 10.6.7 with makepkg but the compilation fails, gcc found problems in the source code about LLVM and some errors about undeclared functions,

I am quite sure that flightgear is not really compatible with mesa 11, maybe flightgear uses openGL in a bad way, with mesa 11 this may trigger a bug,

currently I use mesa 11, the others 3D games works without problems ( Xonotics for example )
Comment by patrick (potomac) - Thursday, 01 October 2015, 16:23 GMT
I tried to use gdb with a debug version of flightgear, but the backtrace doesn't help :

(gdb) bt full
#0 0x00007ffff7fc45eb in ?? ()
No symbol table info available.
#1 0x0000000001cb2400 in ?? ()
No symbol table info available.
#2 0x00007fffe5d4d740 in ?? ()
No symbol table info available.
#3 0x0000000000000007 in ?? ()
No symbol table info available.
#4 0x0000000000000000 in ?? ()
No symbol table info available.
(gdb) cont
Continuing.
[Thread 0x7fffeb18c700 (LWP 15280) exited]
[Thread 0x7fffcd042700 (LWP 15279) exited]
[Thread 0x7fffdffff700 (LWP 15277) exited]
[Thread 0x7fffe6755700 (LWP 15271) exited]
[Thread 0x7ffff7ee5800 (LWP 15262) exited]

Program terminated with signal SIGILL, Illegal instruction.
Comment by patrick (potomac) - Thursday, 01 October 2015, 16:48 GMT
with " thread apply all bt full" command I get a more usefull backtrace, you can check in attachement
Comment by patrick (potomac) - Thursday, 01 October 2015, 17:40 GMT
I created a "trace" file with apitrace, see the attachment,

you can replay the trace with "apitrace" software ( or "qapitrace" if you want a GUI ), it's a tool who can help the openGL developpers to track the bug,

I created also a bugreport in mesa's bugzilla :

https://bugs.freedesktop.org/show_bug.cgi?id=92214

and also flightgear's bugzilla :

https://sourceforge.net/p/flightgear/codetickets/1803/

I think the bug is related to mesa 11 ( r600 driver module in mesa ), something went wrong in mesa 11, but the culprit could be also flightgear if some openGL calls are made in a bad way
Comment by patrick (potomac) - Friday, 02 October 2015, 02:33 GMT
in fact the real culprit is llvm-3.7.0-4 and llvm-libs-3.7.0-4,

because if I downgrade llvm and llvm-libs to the 3.6.2-4 version, and if I rebuild mesa 11.0.2 packages with llvm 3.6.2 then all is ok, no bugs, flightgear will not crash, I can run also LIBGL_ALWAYS_SOFTWARE=1 without problems
Comment by patrick (potomac) - Sunday, 11 October 2015, 15:22 GMT
I notice in mesa's PKGBUILD a weird hack :

# Fix detection of libLLVM when built with CMake
sed -i 's/LLVM_SO_NAME=.*/LLVM_SO_NAME=LLVM/' configure

https://projects.archlinux.org/svntogit/packages.git/tree/trunk/PKGBUILD?h=packages/mesa

the archlinux maintainer of mesa has probably noticed that the "configure" file was written to detect llvm libs only if the version of llvm is 3.6.2 or below,

because the name of the so files has changed since 3.7.0 version ( "usr/lib/libLLVM.so" for LLVM 3.7.0, and "/usr/lib/libLLVM-3.6.2.so" for LLVM 3.6.2 ),

this hack ( altering the configure file of mesa ) could be a bad idea if mesa developers didn't have really tested LLVM 3.7.0 libs, 3.7.0 version may have some changes ( API, functions ), which imply also some changes in mesa source code, we don't know if these changes in mesa source code were made,

it could explain my bug if mesa 11.0.3 was not ready for a general use with llvm 3.7.0 libs

Comment by patrick (potomac) - Monday, 12 October 2015, 21:24 GMT
I made an interesting discovery : the bug occurs also in a virtual machine ( qemu i686, OS guest : archlinux i686 ),

in this virtual machine it's not the r600 driver who is used, it's the swrast_dri.so file ( 100% emulation software, no 3D acceleration ),

in this virtual machine all openGL programs crash ( glxgears for example ), with the error "illegal instruction",

this qemu i686 virtual machine runs in my PC ( OS host : archlinux 64 bits, CPU: pentium dual core E6800 ),

glxinfo for this qemu VM :

name of display: :0
display: :0 screen: 0
direct rendering: Yes
server glx vendor string: SGI
server glx version string: 1.4
OpenGL vendor string: VMware, Inc.
OpenGL renderer string: Gallium 0.4 on llvmpipe (LLVM 3.7, 128 bits)
OpenGL version string: 3.0 Mesa 11.0.3
OpenGL shading language version string: 1.30

log of Xorg :

[ 13.255] (WW) Open ACPI failed (/var/run/acpid.socket) (No such file or directory)
[ 13.533] (II) Loading /usr/lib/xorg/modules/extensions/libglx.so
[ 13.962] (II) Loading /usr/lib/xorg/modules/drivers/vmware_drv.so
[ 14.948] (II) Loading /usr/lib/xorg/modules/drivers/modesetting_drv.so
[ 14.978] (II) Loading /usr/lib/xorg/modules/drivers/fbdev_drv.so
[ 15.010] (II) Loading /usr/lib/xorg/modules/drivers/vesa_drv.so
[ 15.060] (II) Loading /usr/lib/xorg/modules/libfbdevhw.so
[ 15.270] (II) Loading /usr/lib/xorg/modules/libvgahw.so
[ 15.281] (==) vmware(0): Using HW cursor
[ 15.282] (II) Loading /usr/lib/xorg/modules/libfb.so
[ 15.367] (II) Loading /usr/lib/xorg/modules/libshadowfb.so

[ 13.962] (II) Loading /usr/lib/xorg/modules/drivers/vmware_drv.so
[ 14.948] (II) Loading /usr/lib/xorg/modules/drivers/modesetting_drv.so
[ 14.978] (II) Loading /usr/lib/xorg/modules/drivers/fbdev_drv.so
[ 15.010] (II) Loading /usr/lib/xorg/modules/drivers/vesa_drv.so
[ 15.053] (II) vmware: driver for VMware SVGA: vmware0405, vmware0710
[ 15.053] (II) FBDEV: driver for framebuffer: fbdev
[ 15.053] (II) VESA: driver for VESA chipsets: vesa

the mesa driver seems to be swrast_dri.so,

the backtrace is still the same :

Starting program: /usr/bin/glxgears
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/usr/lib/libthread_db.so.1".
[New Thread 0xb450eb40 (LWP 839)]
[New Thread 0xb3d0db40 (LWP 840)]

Program received signal SIGILL, Illegal instruction.
0xb7fd2091 in ?? ()

Thread 3 (Thread 0xb3d0db40 (LWP 840)):
#0 0xb7fdbbc8 in __kernel_vsyscall ()
No symbol table info available.
#1 0xb7a7da2b in pthread_cond_wait@@GLIBC_2.3.2 () from /usr/lib/libpthread.so.0
No symbol table info available.
#2 0xb7c8de4d in pthread_cond_wait@@GLIBC_2.3.2 () from /usr/lib/libc.so.6
No symbol table info available.
#3 0xb757940a in ?? () from /usr/lib/xorg/modules/dri/swrast_dri.so
No symbol table info available.
#4 0xb7579275 in ?? () from /usr/lib/xorg/modules/dri/swrast_dri.so
No symbol table info available.
#5 0xb7a78315 in start_thread () from /usr/lib/libpthread.so.0
No symbol table info available.
#6 0xb7c80e1e in clone () from /usr/lib/libc.so.6
No symbol table info available.

Thread 2 (Thread 0xb450eb40 (LWP 839)):
#0 0xb7fdbbc8 in __kernel_vsyscall ()
No symbol table info available.
#1 0xb7a7da2b in pthread_cond_wait@@GLIBC_2.3.2 () from /usr/lib/libpthread.so.0
No symbol table info available.
#2 0xb7c8de4d in pthread_cond_wait@@GLIBC_2.3.2 () from /usr/lib/libc.so.6
No symbol table info available.
#3 0xb757940a in ?? () from /usr/lib/xorg/modules/dri/swrast_dri.so
No symbol table info available.
#4 0xb7579275 in ?? () from /usr/lib/xorg/modules/dri/swrast_dri.so
No symbol table info available.
#5 0xb7a78315 in start_thread () from /usr/lib/libpthread.so.0
No symbol table info available.
#6 0xb7c80e1e in clone () from /usr/lib/libc.so.6
No symbol table info available.

Thread 1 (Thread 0xb7a5f700 (LWP 838)):
#0 0xb7fd2091 in ?? ()
No symbol table info available.
#1 0xb7367986 in ?? () from /usr/lib/xorg/modules/dri/swrast_dri.so
No symbol table info available.
#2 0xb7367d36 in ?? () from /usr/lib/xorg/modules/dri/swrast_dri.so
No symbol table info available.
#3 0xb729bc19 in ?? () from /usr/lib/xorg/modules/dri/swrast_dri.so
No symbol table info available.
#4 0xb72942e3 in ?? () from /usr/lib/xorg/modules/dri/swrast_dri.so
No symbol table info available.
#5 0xb72948c6 in ?? () from /usr/lib/xorg/modules/dri/swrast_dri.so
No symbol table info available.
#6 0xb7577813 in ?? () from /usr/lib/xorg/modules/dri/swrast_dri.so
No symbol table info available.
#7 0xb72820fd in ?? () from /usr/lib/xorg/modules/dri/swrast_dri.so
No symbol table info available.
#8 0xb713d166 in ?? () from /usr/lib/xorg/modules/dri/swrast_dri.so
No symbol table info available.
#9 0xb7125f5a in ?? () from /usr/lib/xorg/modules/dri/swrast_dri.so
No symbol table info available.
#10 0xb6ff0600 in ?? () from /usr/lib/xorg/modules/dri/swrast_dri.so
No symbol table info available.
#11 0xb7004b40 in ?? () from /usr/lib/xorg/modules/dri/swrast_dri.so
No symbol table info available.
#12 0x08049fdb in ?? ()
No symbol table info available.
#13 0x080496ca in ?? ()
No symbol table info available.
#14 0xb7baf497 in __libc_start_main () from /usr/lib/libc.so.6
No symbol table info available.
#15 0x08049d0a in ?? ()
No symbol table info available.
Comment by patrick (potomac) - Tuesday, 13 October 2015, 02:10 GMT
another discovery :

in qemu I can set a type of CPU ( pentium, pentium2, pentium2, core2duo, SandyBridge and many more ), you can see the CPUs list with the command "qemu-i386 -cpu ?",

until now I used the qemu option "-cpu host", which means that it's the CPU of the host who is emulated ( my pentium dual core E6800 ),

then I decided to set a different CPU name in my qemu script :

-cpu core2duo -enable-kvm -machine type=pc,accel=kvm -smp 2

with this setting the bug disapears, all is ok in my virtual machine, glxgears and all openGL programs can run without crash, the mesa driver llvmpipe doesn't crash,

after that I decided to do set again another CPU in qemu :

-cpu Penryn -enable-kvm -machine type=pc,accel=kvm -smp 2 \

with "Penryn" CPU the bug is back in my virtual machine, which means that the bug seems related to the type of CPU, llvm 3.7.0 lib may have a bug when he tries to generate binary code, it fails with some CPUs,

this problem doesn't exist with llvm 3.6.2 lib
Comment by patrick (potomac) - Wednesday, 14 October 2015, 06:38 GMT
I found the cause of this bug,

it's llvm 3.7.0, the llvm git commit who has introduced this bug is :

cd83d5b5071f072882ad06cc4b904b2d27d1e54a

https://github.com/llvm-mirror/llvm/commit/cd83d5b5071f072882ad06cc4b904b2d27d1e54a

the problem is that llvm 3.7.0 treats my pentium dual core as a "penryn",

penryn supports SSE4, but not the pentium dual core series ( CPU family 6 model 23 ),

the faulty commit has deleted a test about SSE4 :

return HasSSE41 ? "penryn" : "core2";

the solution is simply to add this test for CPU family 6 model 23, I created a patch who solves this bug :

--- a/lib/Support/Host.cpp 2015-10-14 07:13:52.381374679 +0200
+++ b/lib/Support/Host.cpp 2015-10-14 07:13:28.224708323 +0200
@@ -332,6 +332,8 @@
// 17h. All processors are manufactured using the 45 nm process.
//
// 45nm: Penryn , Wolfdale, Yorkfield (XE)
+ // Not all Penryn processors support SSE 4.1 (such as the Pentium brand)
+ return HasSSE41 ? "penryn" : "core2";
case 29: // Intel Xeon processor MP. All processors are manufactured using
// the 45 nm process.
return "penryn";

this patch has been sent to llvm's bugzilla, I hope they will accept it

Loading...