Arch Linux

Please read this before reporting a bug:
https://wiki.archlinux.org/title/Bug_reporting_guidelines

Do NOT report bugs when a package is just outdated, or it is in the AUR. Use the 'flag out of date' link on the package page, or the Mailing List.

REPEAT: Do NOT report bugs for outdated packages!
Tasklist

FS#80323 - [mesa] radeonsi python-pytorch ROCm segfaults

Attached to Project: Arch Linux
Opened by c (grinness) - Wednesday, 22 November 2023, 09:50 GMT
Last edited by Buggy McBugFace (bugbot) - Saturday, 25 November 2023, 20:21 GMT
Task Type Bug Report
Category Packages: Extra
Status Closed
Assigned To Jan Alexander Steffens (heftig)
Laurent Carlier (lordheavy)
Felix Yan (felixonmars)
Architecture All
Severity Low
Priority Normal
Reported Version
Due in Version Undecided
Due Date Undecided
Percent Complete 100%
Votes 0
Private No

Details

Description:

I experience exactly the same behavior as in  FS#79725  that has been closed with the updated package for python-pytorch-rocm and updated version of ROCm and HIP stack (5.7.1)
pytorch applications segfaults on gfx 1030 (rx6800)

Furthermore runnning the below:

AMD_LOG_LEVEL=1 python
>>> import torch
>>> torch.cuda.current_device()

shows a bunch of errors regarding:

hipErrorNoBinaryForGpu: Unable to find code object for all current devices!
:1:hip_code_object.cpp :517 : 0976091129 us: [pid:24304 tid:0x7f34c722e740] Devices:
:1:hip_code_object.cpp :519 : 0976091133 us: [pid:24304 tid:0x7f34c722e740] amdgcn-amd-amdhsa--gfx1030 - [Not Found]
:1:hip_code_object.cpp :524 : 0976091135 us: [pid:24304 tid:0x7f34c722e740] Bundled Code Objects:
:1:hip_code_object.cpp :540 : 0976091138 us: [pid:24304 tid:0x7f34c722e740] host-x86_64-unknown-linux-- - [Unsupported]
:1:hip_code_object.cpp :537 : 0976091141 us: [pid:24304 tid:0x7f34c722e740] hipv4-amdgcn-amd-amdhsa--gfx906 - [code object targetID is amdgcn-amd-amdhsa--gfx906]

See attachment.

Note that compiling from source and disabling magma (USE_MAGMA=OFF) solves the problem
I also attach the PKGBUILD that works for reference -- same PKGBUILD posted in  FS#79725 

This task depends upon

Closed by  Buggy McBugFace (bugbot)
Saturday, 25 November 2023, 20:21 GMT
Reason for closing:  Moved
Additional comments about closing:  https://gitlab.archlinux.org/archlinux/p ackaging/packages/mesa/issues/3
Comment by Toolybird (Toolybird) - Wednesday, 22 November 2023, 19:40 GMT
@grinness, you should know by now that if reporting segfault crashes, you *must* provide a backtrace. And it *must* be a be backtrace that includes debugging information via debuginfod. For example, the backtrace you posted in  FS#80301  is missing debug symbols. If you don't see source code line numbers in the trace then it's essentially useless.

Ensure gdb is installed then:

$ coredumpctl gdb (then answer y when it asks "Enable debuginfod for this session?")
(gdb) set logging enabled
(gdb) bt (or bt full)

Then post gdb.txt

More reading at [1][2]

[1] https://blogs.gnome.org/mcatanzaro/2021/09/18/creating-quality-backtraces-for-crash-reports/
[2] https://wiki.archlinux.org/title/Debugging/Getting_traces
Comment by c (grinness) - Wednesday, 22 November 2023, 20:39 GMT
@Toolybird (Toolybird)

apologies, I have run a sample python code training a neural network under gdb and found that the segmentation fault is not in pytorch-rocm, it is actually caused by a call to matplotlib (commenting the relevant code out no segmentation dump)
I attach the gdb out regardless -- the debug info seems to point to unaligned memory in radeonsi

Note that the warnings about amdgcn-amd-amdhsa--gfx1030 - [Not Found] are present running the sample code provided in my first post.

If you and the maintainer want I can close this and open a new one with the correct title.

Comment by Toolybird (Toolybird) - Wednesday, 22 November 2023, 21:05 GMT
> the debug info seems to point to unaligned memory in radeonsi

Ok, thanks for that. Therefore it seems like an upstream bug in mesa. You should probably report this crash upstream but I will firstly reassign this ticket to the mesa PM's for a look-see.

Loading...