Arch Linux

Please read this before reporting a bug:
https://wiki.archlinux.org/title/Bug_reporting_guidelines

Do NOT report bugs when a package is just outdated, or it is in the AUR. Use the 'flag out of date' link on the package page, or the Mailing List.

REPEAT: Do NOT report bugs for outdated packages!
Tasklist

FS#78091 - [vulkan-validation-layers] sway vulkan renderer is broken after update

Attached to Project: Arch Linux
Opened by Nikolaos Bezirgiannis (bezirg) - Saturday, 01 April 2023, 15:43 GMT
Last edited by Laurent Carlier (lordheavy) - Sunday, 11 June 2023, 12:27 GMT
Task Type Bug Report
Category Packages: Extra
Status Closed
Assigned To Laurent Carlier (lordheavy)
Architecture All
Severity Low
Priority Normal
Reported Version
Due in Version Undecided
Due Date Undecided
Percent Complete 100%
Votes 1
Private No

Details

Description:

After the recent update to `vulkan-validation-layers`, sway's vulkan renderer backend stopped working and
gives the error:

```
00:00:00.140 [wlr] [types/wlr_drm_lease_v1.c:715] No DRM backend supplied, failed to create wlr_drm_lease_v1_manager
2023-04-01 17:38:04 - [swaybg-1.2.0/main.c:582] wl_display_roundtrip failed
```

Additional info:

* Affected package version

Version : 1.3.243.0-1

* Working package version

Version: 1.3.236.0-1

Steps to reproduce:

Run `WLR_RENDERER=vulkan sway`


This task depends upon

Closed by  Laurent Carlier (lordheavy)
Sunday, 11 June 2023, 12:27 GMT
Reason for closing:  Fixed
Additional comments about closing:  vulkan-validation-layers-1.3.250.0-1
Comment by Nikolaos Bezirgiannis (bezirg) - Saturday, 01 April 2023, 15:44 GMT
Simply downgrading `vulkan-validation-layers` to previous version 1.3.236.0-1 , fixes the problem.
Comment by Toolybird (Toolybird) - Saturday, 01 April 2023, 23:23 GMT
Related  FS#78083 
Comment by Andrew O'Neil (ajoneilnz) - Sunday, 02 April 2023, 00:31 GMT
I was directed here from  FS#78083 

I am getting the following validation errors in my vulkan application with the repo provided vulkan-validation-layers 1.3.243.0-1:

VALIDATION - "Validation Error: [ SYNC-HAZARD-READ-AFTER-WRITE ] Object 0: handle = 0xfef35a00000000a0, type = VK_OBJECT_TYPE_IMAGE; | MessageID = 0xe4d96472 | vkCmdCopyImage: Hazard READ_AFTER_WRITE for srcImage VkImage 0xfef35a00000000a0[], region 0. Access info (usage: SYNC_COPY_TRANSFER_READ, prior_usage: SYNC_IMAGE_LAYOUT_TRANSITION, write_barriers: 0, command: vkCmdPipelineBarrier2, seq_no: 2, reset_no: 1)."
VALIDATION - "Validation Error: [ SYNC-HAZARD-WRITE-AFTER-WRITE ] Object 0: handle = 0xead9370000000008, type = VK_OBJECT_TYPE_IMAGE; | MessageID = 0x5c0ec5d6 | vkCmdCopyImage: Hazard WRITE_AFTER_WRITE for dstImage VkImage 0xead9370000000008[], region 0. Access info (usage: SYNC_COPY_TRANSFER_WRITE, prior_usage: SYNC_IMAGE_LAYOUT_TRANSITION, write_barriers: 0, command: vkCmdPipelineBarrier2, seq_no: 1, reset_no: 1)."


These were present when I was developing with the repo 1.3.236.0. Some debugging lead me to believe it was likely a bug in the validation layer itself, given the bug reports and fixes listed on their github. I ended up building 1.3.239.0 myself, as this version was never added to the Arch repos, and the validation error disappeared. I required a patch to get this version to build, which I've attached.

After the official arch package updated to 1.3.243.0-1, the validation errors came back. However building my own package of 1.3.243.0 resolves these errors once again. I am also unable to build the PKGBUILD from the Arch repo as-is - I still require the patch that I have attached. This persists building on both of my two Arch Linux machines, from within a clean docker archlinux:base-devel, and from within a clean chroot as directed by the previous ticket I have opened.

I'm struggling to understand why I am unable to build the PKGBUILD as-is like apparently works fine for the Arch developers according to the previous ticket, and I don't understand why these validation errors are back in the officially built package when they don't exist when I build it myself.
Comment by michael buckley (mokchira) - Sunday, 16 April 2023, 21:24 GMT
I am experiencing the same issue as ajoneilnz: vulkan-validation-layers as installed by pacman (version 1.3.243.0-1) reports incorrect errors. I also hit the same compile error he listed in  FS#78083  trying to build the package from $srcdir with makepkg.

If I run extra-x86_64-build instead of makepkg, I get the same libVkLayer_khronos_validation.so that causes the errors: it has a sha256sum of 755baa2a3b43ec497138c35c603ce225d5c1ff0cbc8e701ff314dd5b45bec70c if that helps at all. Note its possible that this only spews bad errors in some applications. Since you would need an application to reproduce, an imgui application running the vulkan backend should do it (it does for me).

If I just unpack sdk-1.3.243.0.tar.gz and run the update_deps.py and build command myself, it builds without errors and the resulting lib does not report any wrong errors when I run a vulkan application.

The cause of discrepancy seems to be the CXXFLAGS env var that gets set up when you go through makepkg. Those flags trickle down to the build and result in different behavior in the resulting library.

On my machine at least those flags are:

CXXFLAGS=-march=x86-64 -mtune=generic -O2 -pipe -fno-plt -fexceptions -Wp,-D_FORTIFY_SOURCE=2 -Wformat -Werror=format-security -fstack-clash-protection -fcf-protection -Wp,-D_GLIBCXX_ASSERTIONS -g -ffile-prefix-map=/build/vulkan-validation-layers/src=/usr/src/debug/vulkan-validation-layers -flto=auto

( I got these flags by inserting a printenv in the build() function of the PKGBUILD file and then running /chootbuild from inside the chroot with the bind dirs set up. Could have also just run extra-x86_64-build )

If you set your CXXFLAGS environment variable to the above, and then go to build the validation layers natively from source - essentially just follow the build() function in PKGBUILD, you will generate a library that has the same issue as the pacman one.

I also noticed some concerning warnings generated from the build by running extra-x86_64-build:

/usr/src/debug/vulkan-validation-layers/Vulkan-ValidationLayers-sdk-1.3.243.0/build/SPIRV-Tools/tools/opt/opt.cpp:628:55: warning: argument 1 value ‘18446744073709551615’ exceeds maximum object size 9223372036854775807 [-Walloc-size-larger-than=]
/usr/include/c++/12.2.1/new:128:26: note: in a call to allocation function ‘operator new []’ declared here
128 | _GLIBCXX_NODISCARD void* operator new[](std::size_t) _GLIBCXX_THROW (std::bad_alloc)

usr/src/debug/vulkan-validation-layers/Vulkan-ValidationLayers-sdk-1.3.243.0/build/SPIRV-Tools/source/opt/types.h:68:7: warning: virtual table of type ‘struct Type’ violates one definition rule [-Wodr]
/usr/src/debug/vulkan-validation-layers/Vulkan-ValidationLayers-sdk-1.3.243.0/build/SPIRV-Tools/source/opt/types.h:68:7: note: the conflicting type defined in another translation unit has virtual table with more entries
/usr/src/debug/vulkan-validation-layers/Vulkan-ValidationLayers-sdk-1.3.243.0/build/SPIRV-Tools/source/opt/scalar_analysis_nodes.h:43:7: warning: virtual table of type ‘struct SENode’ violates one definition rule [-Wodr]
/usr/src/debug/vulkan-validation-layers/Vulkan-ValidationLayers-sdk-1.3.243.0/build/SPIRV-Tools/source/opt/scalar_analysis_nodes.h:43:7: note: the conflicting type defined in another translation unit has virtual table with more entries
/usr/src/debug/vulkan-validation-layers/Vulkan-ValidationLayers-sdk-1.3.243.0/build/SPIRV-Tools/source/opt/instruction_list.h:44:7: warning: virtual table of type ‘struct InstructionList’ violates one definition rule [-Wodr]
/usr/src/debug/vulkan-validation-layers/Vulkan-ValidationLayers-sdk-1.3.243.0/build/SPIRV-Tools/source/opt/instruction_list.h:44:7: note: the conflicting type defined in another translation unit has virtual table with more entries
/usr/src/debug/vulkan-validation-layers/Vulkan-ValidationLayers-sdk-1.3.243.0/build/SPIRV-Tools/source/util/ilist.h:49:7: warning: virtual table of type ‘struct IntrusiveList’ violates one definition rule [-Wodr]
/usr/src/debug/vulkan-validation-layers/Vulkan-ValidationLayers-sdk-1.3.243.0/build/SPIRV-Tools/source/util/ilist.h:49:7: note: the conflicting type defined in another translation unit has virtual table with more entries
/usr/src/debug/vulkan-validation-layers/Vulkan-ValidationLayers-sdk-1.3.243.0/build/SPIRV-Tools/source/opt/instruction.h:182:7: warning: virtual table of type ‘struct Instruction’ violates one definition rule [-Wodr]
/usr/src/debug/vulkan-validation-layers/Vulkan-ValidationLayers-sdk-1.3.243.0/build/SPIRV-Tools/source/opt/instruction.h:182:7: note: the conflicting type defined in another translation unit has virtual table with more entries
/usr/src/debug/vulkan-validation-layers/Vulkan-ValidationLayers-sdk-1.3.243.0/build/SPIRV-Tools/source/opt/constants.h:289:7: warning: virtual table of type ‘struct CompositeConstant’ violates one definition rule [-Wodr]
/usr/src/debug/vulkan-validation-layers/Vulkan-ValidationLayers-sdk-1.3.243.0/build/SPIRV-Tools/source/opt/constants.h:289:7: note: the conflicting type defined in another translation unit has virtual table with more entries
/usr/src/debug/vulkan-validation-layers/Vulkan-ValidationLayers-sdk-1.3.243.0/build/SPIRV-Tools/source/opt/constants.h:145:7: warning: virtual table of type ‘struct ScalarConstant’ violates one definition rule [-Wodr]
/usr/src/debug/vulkan-validation-layers/Vulkan-ValidationLayers-sdk-1.3.243.0/build/SPIRV-Tools/source/opt/constants.h:145:7: note: the conflicting type defined in another translation unit has virtual table with more entries
/usr/src/debug/vulkan-validation-layers/Vulkan-ValidationLayers-sdk-1.3.243.0/build/SPIRV-Tools/source/util/small_vector.h:42:7: warning: virtual table of type ‘struct SmallVector’ violates one definition rule [-Wodr]
/usr/src/debug/vulkan-validation-layers/Vulkan-ValidationLayers-sdk-1.3.243.0/build/SPIRV-Tools/source/util/small_vector.h:42:7: note: the conflicting type defined in another translation unit has virtual table with more entries
/usr/src/debug/vulkan-validation-layers/Vulkan-ValidationLayers-sdk-1.3.243.0/build/SPIRV-Tools/source/util/ilist_node.h:30:7: warning: virtual table of type ‘struct IntrusiveNodeBase’ violates one definition rule [-Wodr]
/usr/src/debug/vulkan-validation-layers/Vulkan-ValidationLayers-sdk-1.3.243.0/build/SPIRV-Tools/source/util/ilist_node.h:30:7: note: the conflicting type defined in another translation unit has virtual table with more entries

To be clear, you should see the above output if you run
$ asp checkout vulkan-validation-layers
$ cd vulkan-validation-layers/trunk
$ extra-x86_64-build

Again, these show up if you have those CXXFLAGS set.

Since there is nothing in those CXXFLAGS that seems wrong to me - they seem to just be providing better security and overflow protection, and nothing that should actually change the observable behavior of the program - it seems like this might be an upstream problem.

I'm currently running some tests to find which flag(s) exactly are the ones that change the behavior. Once I do, I'll probably file a bug upstream. SPIRV-Tools is the one producing those warnings, so is my current guess as to the one that is actually causing the problem.
Comment by michael buckley (mokchira) - Sunday, 16 April 2023, 23:26 GMT
Accidental duplicate comment.
Comment by michael buckley (mokchira) - Monday, 17 April 2023, 00:00 GMT
Update. It looks like -flta=auto flag is the main culprit. With just a
CXXFLAGS=-flta=auto -O2
I was was able to reproduce the bad error messages.
It seems like the flag -Wp,-D_GLIBCXX_ASSERTIONS is responsible for the build errors when just running makepkg without the chroot.

Not sure if this is something that is best worked around in the package or this means there is an issue upstream. Seems like fixing it in the package would just require stripping these flags from the CXXFLAGS env var, and I'm not sure if that is kosher or not.

One thing that complicates things a bit is that part of the build() step is to build the dependencies, which basically is invoking a python script that calls CMake. It might be possible to build the dependencies in such a way that these flags do not cause problems by digging into this script and changing it, or avoiding in entirely and building the dependencies differently. They are all CMake based dependencies, so it is not too diffiicult to build them without the script.
Comment by i0f (I0F) - Friday, 09 June 2023, 18:06 GMT
I just build the new 1.3.250.0 release myself with the PKGBUILD and it seems to be fixed with that.
Comment by Laurent Carlier (lordheavy) - Sunday, 11 June 2023, 09:03 GMT
Please test with vulkan-validation-layers-1.3.250.0-1
Comment by Nikolaos Bezirgiannis (bezirg) - Sunday, 11 June 2023, 10:41 GMT
Yes, I also confirm that vulkan-validation-layers-1.3.250.0-1 fixed WLR_RENDERER=vulkan for sway.
Comment by i0f (I0F) - Sunday, 11 June 2023, 10:56 GMT
Yes, it works with the official package.

Loading...