FS#61797 - [lib32-mesa] Game crashes when mesa built with -O2

Attached to Project: Community Packages
Opened by Henri (Valta) Osmankäämi (cgx) - Monday, 18 February 2019, 08:43 GMT
Last edited by Toolybird (Toolybird) - Thursday, 20 April 2023, 22:50 GMT
Task Type Bug Report
Category Packages: Multilib
Status Closed
Assigned To Laurent Carlier (lordheavy)
Architecture All
Severity Low
Priority Normal
Reported Version
Due in Version Undecided
Due Date Undecided
Percent Complete 100%
Votes 0
Private No

Details

Short summary:
Game Penumbra overture (32bit running on arch 64 bit) crashes on regular mesa packages.
Recompiling lib32-mesa without -O2 in CFLAGS fixes it.

Please see this mesa bug report for full details, I won't duplicate everything here
https://bugs.freedesktop.org/show_bug.cgi?id=109048

Opened another bug here as it might be a compiler bug or somehow only applicable to arch only as Timothy Arceri has not been able to replicate it yet.

Package versions:
gcc 8.2.1+20181127-1
mesa 18.3.3-1 and lots of other versions from git and packages
This task depends upon

Closed by  Toolybird (Toolybird)
Thursday, 20 April 2023, 22:50 GMT
Reason for closing:  Fixed
Additional comments about closing:  Assuming fixed as upstream issue was closed as "Fixed by <...commit...>.
Comment by loqs (loqs) - Monday, 18 February 2019, 20:27 GMT
Have you tried to determine which optimization flag or flags enabled by 02 is the cause of the issue?
Comment by Henri (Valta) Osmankäämi (cgx) - Tuesday, 19 February 2019, 08:00 GMT
loqs: Didn't think of that. I'll try
Comment by Henri (Valta) Osmankäämi (cgx) - Tuesday, 19 February 2019, 10:23 GMT
Well this is getting interesting now...
I've generated the optimization settings for optimization levels 1 and 2 as follows:
gcc -Q -m32 -O1 --help=optimizers
gcc -Q -m32 -O2 --help=optimizers

I've diffed the outputs and according to it
export CFLAGS="
-O1
-falign-functions=16
-falign-jumps=16
-falign-labels
-falign-labels=0
-falign-loops=16
-fcaller-saves
-fcode-hoisting
-fcrossjumping
-fcse-follow-jumps
-fdevirtualize
-fdevirtualize-speculatively
-fexpensive-optimizations
-fgcse
-fhoist-adjacent-loads
-findirect-inlining
-finline-small-functions
-fipa-bit-cp
-fipa-cp
-fipa-icf
-fipa-icf-functions
-fipa-icf-variables
-fipa-ra
-fipa-sra
-fipa-vrp
-fisolate-erroneous-paths-dereference
-flra-remat
-foptimize-sibling-calls
-foptimize-strlen
-fpartial-inlining
-fpeephole2
-freorder-blocks-algorithm=stc
-freorder-blocks-and-partition
-freorder-functions
-frerun-cse-after-loop
-fschedule-insns2
-fstore-merging
-fstrict-aliasing
-fthread-jumps
-ftree-pre
-ftree-switch-conversion
-ftree-tail-merge
-ftree-vrp
-fvect-cost-model=cheap
"

should be identical to just -O2, but when compiled with these options, the crash won't happen...
gcc manual says "Not all optimizations are controlled directly by a flag. Only optimizations that have a flag are listed in this section."
so maybe -O2 still does something different?
Comment by Henri (Valta) Osmankäämi (cgx) - Tuesday, 19 February 2019, 12:33 GMT
The difference between the disassembly of bad and good radeonsi_dri.so, in function amdgpu_bo_map:

bad (compiled with -O2):
2b4827: e8 b4 f7 ff ff call 2b3fe0 <amdgpu_bo_wait>
2b482c: 83 c4 10 add $0x10,%esp
2b482f: e8 5c b4 55 00 call 80fc90 <os_time_get_nano>
2b4834: 89 44 24 18 mov %eax,0x18(%esp)
2b4838: 89 54 24 1c mov %edx,0x1c(%esp)
2b483c: 8b 46 54 mov 0x54(%esi),%eax
2b483f: f3 0f 7e 4c 24 18 movq 0x18(%esp),%xmm1
2b4845: f3 0f 7e 80 04 02 00 movq 0x204(%eax),%xmm0
2b484c: 00
2b484d: 66 0f d4 c1 paddq %xmm1,%xmm0
2b4851: 66 0f fb 04 24 psubq (%esp),%xmm0
2b4856: 66 0f d6 80 04 02 00 movq %xmm0,0x204(%eax)
2b485d: 00
2b485e: 66 90 xchg %ax,%ax
2b4860: 8b 56 5c mov 0x5c(%esi),%edx
2b4863: 66 0f ef c0 pxor %xmm0,%xmm0
2b4867: c7 44 24 28 00 00 00 movl $0x0,0x28(%esp)
2b486e: 00
2b486f: 66 0f 7e c5 movd %xmm0,%ebp
2b4873: 85 d2 test %edx,%edx

segfault happens in psubq

good (compiled with the flags specified in previous message):
2b3ef7: e8 c4 f7 ff ff call 2b36c0 <amdgpu_bo_wait>
2b3efc: 83 c4 10 add $0x10,%esp
2b3eff: e8 0c bb 55 00 call 80fa10 <os_time_get_nano>
2b3f04: 8b 4e 54 mov 0x54(%esi),%ecx
2b3f07: 03 81 04 02 00 00 add 0x204(%ecx),%eax
2b3f0d: 13 91 08 02 00 00 adc 0x208(%ecx),%edx
2b3f13: 2b 04 24 sub (%esp),%eax
2b3f16: 1b 54 24 04 sbb 0x4(%esp),%edx
2b3f1a: 89 81 04 02 00 00 mov %eax,0x204(%ecx)
2b3f20: 89 91 08 02 00 00 mov %edx,0x208(%ecx)
2b3f26: 8d b4 26 00 00 00 00 lea 0x0(%esi,%eiz,1),%esi
2b3f2d: 8d 76 00 lea 0x0(%esi),%esi
2b3f30: 8b 56 5c mov 0x5c(%esi),%edx
2b3f33: c7 44 24 18 00 00 00 movl $0x0,0x18(%esp)
2b3f3a: 00
2b3f3b: c7 04 24 00 00 00 00 movl $0x0,(%esp)
2b3f42: c7 44 24 04 00 00 00 movl $0x0,0x4(%esp)
2b3f49: 00
2b3f4a: 85 d2 test %edx,%edx
So the difference seems to be the usage of these SSE instructions...
Comment by Henri (Valta) Osmankäämi (cgx) - Thursday, 29 July 2021, 08:14 GMT

Loading...