FS#69810 - [linux] System doesn't boot with amd_iommu=off

Attached to Project: Arch Linux
Opened by Mthw (jari_45) - Saturday, 27 February 2021, 13:03 GMT
Last edited by Jan Alexander Steffens (heftig) - Friday, 11 March 2022, 16:36 GMT
Task Type Bug Report
Category Kernel
Status Closed
Assigned To Jan Alexander Steffens (heftig)
Architecture All
Severity High
Priority Normal
Reported Version
Due in Version Undecided
Due Date Undecided
Percent Complete 100%
Votes 1
Private No

Details

Description:
I have a laptop with an AMD 3550H CPU and since Kernel 5.11.x it doesn't boot at all with 'amd_iommu=off' kernel parameter.
To give you more info, when I select the boot entry in systemd-boot nothing happens.
There are no error messages or anything just a blank screen (and external screen is not detected/it doesn't detect any signal from the laptop) and I need to shut down with power button.
What other info should I provide?
Linux 5.10 kernels (like current linux-lts package) and older work correctly.
This is most likely not caused by Arch and will need to be reported elsewhere.
This task depends upon

Closed by  Jan Alexander Steffens (heftig)
Friday, 11 March 2022, 16:36 GMT
Reason for closing:  Fixed
Comment by loqs (loqs) - Saturday, 27 February 2021, 13:50 GMT
Have you looked at  FS#69757 ?
Comment by Mthw (jari_45) - Saturday, 27 February 2021, 15:27 GMT
I have now, but I don't think it's the same issue. I have tested some of the kernels you have posted and I have different results:
linux-loqs-5.10-1-x86_64.pkg.tar.zst - OK
linux-loqs-5.10.r3014.g76d4acf22b48-1-x86_64.pkg.tar - OK
linux 5.11.2.arch1-1 - black screen (regardless of amdgpu module blacklisting)
linux-mainline 5.11 - same
Comment by loqs (loqs) - Saturday, 27 February 2021, 15:33 GMT
Please try
https://drive.google.com/file/d/1jp6strNz5J4vYKT5Lht9wN2FuPN1mqsf/view?usp=sharing linux-loqs-5.10.r7737.g538fcf57aaee-1-x86_64.pkg.tar.zst mid point of commits for 5.11
Comment by Mthw (jari_45) - Saturday, 27 February 2021, 16:03 GMT
It gives a black screen
Comment by loqs (loqs) - Saturday, 27 February 2021, 16:48 GMT
Here is the next one:
https://drive.google.com/file/d/1UJ-3HRFfOZgQ1VCUvxshEam7tFp2CdSf/view?usp=sharing linux-loqs-5.10.r5507.gd635a69dd498-1-x86_64.pkg.tar.zst
Comment by Mthw (jari_45) - Saturday, 27 February 2021, 18:01 GMT
Also a black screen.
Comment by loqs (loqs) - Saturday, 27 February 2021, 18:45 GMT
Here is the next one:
https://drive.google.com/file/d/1dmkNUHXx3uOkjm-IHblACx6aYopB52KX/view?usp=sharing linux-loqs-5.10rc6.r1403.ga1dd1d869731-1-x86_64.pkg.tar.zst
Comment by Mthw (jari_45) - Saturday, 27 February 2021, 20:21 GMT
This one is OK.
Comment by loqs (loqs) - Saturday, 27 February 2021, 20:43 GMT
Here is the next one:
https://drive.google.com/file/d/1k65iW8Wj-_o1zPWyhKfVdDuf5tBy4md1/view?usp=sharing linux-loqs-5.10rc7.r2197.ge5795aacd71b-1-x86_64.pkg.tar.zst
Comment by Jordi (jordicoma) - Saturday, 27 February 2021, 22:03 GMT
Confirmed the bug with the last kernel "Linux ryzen 5.11.2-arch1-1 #1 SMP PREEMPT Fri, 26 Feb 2021 18:26:41 +0000 x86_64 GNU/Linux", ryzen 1600x. It doesn't boot until I delete "amd_iommu=off" from the parameters.
Comment by loqs (loqs) - Sunday, 28 February 2021, 00:03 GMT
If linux-loqs-5.10rc7.r2197.ge5795aacd71b-1-x86_64.pkg.tar.zst is bad the next one is
https://drive.google.com/file/d/1kbaTVC9cb1hH2wXO5KwVPh6BqZ1h7CUB/view?usp=sharing linux-loqs-5.10rc6.r1760.gea6d5c924e39-1-x86_64.pkg.tar.zst
If it was good
https://drive.google.com/file/d/1fV6eWGztA541L8NA0y95IE1Bw0Lxr2kn/view?usp=sharing linux-loqs-5.10.r200.gdfefd226b0bf-1-x86_64.pkg.tar.zst
Comment by Mthw (jari_45) - Sunday, 28 February 2021, 07:54 GMT
Builds 5.10.0-rc7-1-loqs-02197-ge5795aacd71b, 5.10.0-1-loqs-00200-gdfefd226b0bf and 5.10.0-rc6-1-loqs-01760-gea6d5c924e39 are all good.
Comment by loqs (loqs) - Sunday, 28 February 2021, 08:29 GMT
Here is the next one:
https://drive.google.com/file/d/12OIkV-7_vOG-Y7Z50fsVtHsCIdNmRrr5/view?usp=sharing linux-loqs-5.10.r3165.geb0ea74120e0-1-x86_64.pkg.tar.zst
Comment by Mthw (jari_45) - Sunday, 28 February 2021, 08:41 GMT
Also good. Question: Do you make these builds by choosing a commit(?) somewhere between a known good version and a known bad version and then build it? And secondly, you seem to be able to build them quite fast, how long does a build take or do you have them pre-built? IIRC the last time I tried to build a kernel it took me ~2 hours not counting downloading the source which is ~1 more hour.
Comment by loqs (loqs) - Sunday, 28 February 2021, 08:49 GMT
I am using git bisect [1] to choose the commits. Parallel compilation [2] can reduce the build time. The first few are common to all bisections between 5.10 and 5.11 so those were existed from helping someone else, after that I need to know if the last build was good or bad to feed back into git, it takes roughly 25 minutes per build including the upload.

[1] https://wiki.archlinux.org/index.php/Bisecting_bugs_with_Git
[2] https://wiki.archlinux.org/index.php/Makepkg#Parallel_compilation
Comment by Mthw (jari_45) - Monday, 01 March 2021, 08:24 GMT
Just tested 5.12.0-rc1-1-mainline and it's also broken, as expected.
Comment by loqs (loqs) - Monday, 01 March 2021, 08:54 GMT
Did you check
https://drive.google.com/file/d/12OIkV-7_vOG-Y7Z50fsVtHsCIdNmRrr5/view?usp=sharing linux-loqs-5.10.r3165.geb0ea74120e0-1-x86_64.pkg.tar.zst
Comment by Mthw (jari_45) - Monday, 01 March 2021, 13:33 GMT
Yes, I already checked that one, it's good.
Comment by loqs (loqs) - Monday, 01 March 2021, 14:03 GMT
Here is the next one:
https://drive.google.com/file/d/1My36BKy4PhH-oInlH_y5z1zi3UI7oU35/view?usp=sharing linux-loqs-5.10rc7.r2279.g22f07b86d4e5-1-x86_64.pkg.tar.zst
Edit:
If the above is good then please try:
https://drive.google.com/file/d/1Zp7tS-pp964ztO5Yg6rcAhIKZzUq6Ecx/view?usp=sharing linux-loqs-5.10rc1.r42.g26ab12bb9d96-1-x86_64.pkg.tar.zst
Comment by Mthw (jari_45) - Monday, 01 March 2021, 18:17 GMT
The first one works, the second one doesn't.
Comment by loqs (loqs) - Monday, 01 March 2021, 19:00 GMT
https://drive.google.com/file/d/1PQBBrGHGptainsKV6htSVKp66KVtaugc/view?usp=sharing linux-loqs-5.10rc1.r21.g341b4a7211b6-1-x86_64.pkg.tar.zst

20 revisions left to test after this (roughly 4 steps)
Comment by Mthw (jari_45) - Monday, 01 March 2021, 20:17 GMT
This one is good.
Comment by loqs (loqs) - Monday, 01 March 2021, 21:00 GMT
https://drive.google.com/file/d/1I6zF2VqdNBRlmwM2DX_Jss8l8wDK64_6/view?usp=sharing linux-loqs-5.10rc1.r31.g79eb3581bcaa-1-x86_64.pkg.tar.zst

git bisect log
git bisect start
# bad: [f40ddce88593482919761f74910f42f4b84c004b] Linux 5.11
git bisect bad f40ddce88593482919761f74910f42f4b84c004b
# good: [2c85ebc57b3e1817b6ce1a6b703928e113a90442] Linux 5.10
git bisect good 2c85ebc57b3e1817b6ce1a6b703928e113a90442
# bad: [538fcf57aaee6ad78a05f52b69a99baa22b33418] Merge branches 'acpi-scan', 'acpi-pnp' and 'acpi-sleep'
git bisect bad 538fcf57aaee6ad78a05f52b69a99baa22b33418
# bad: [d635a69dd4981cc51f90293f5f64268620ed1565] Merge tag 'net-next-5.11' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next
git bisect bad d635a69dd4981cc51f90293f5f64268620ed1565
# good: [a1dd1d86973182458da7798a95f26cfcbea599b4] Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
git bisect good a1dd1d86973182458da7798a95f26cfcbea599b4
# good: [e5795aacd71b697c739f2d193b0e275993d93187] Merge tag 'wireless-drivers-next-2020-12-12' of git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers-next
git bisect good e5795aacd71b697c739f2d193b0e275993d93187
# good: [dfefd226b0bf7c435a58d75a0ce2f9273b9825f6] mm: cleanup kstrto*() usage
git bisect good dfefd226b0bf7c435a58d75a0ce2f9273b9825f6
# good: [eb0ea74120e0f14a6d6454109153d1b4ccf210fc] Merge tag 'x86-fpu-2020-12-14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
git bisect good eb0ea74120e0f14a6d6454109153d1b4ccf210fc
# good: [22f07b86d4e580424cbeb0ce232ed30d4b5ecb95] Merge branch 'bnxt_en-improve-firmware-flashing'
git bisect good 22f07b86d4e580424cbeb0ce232ed30d4b5ecb95
# bad: [26ab12bb9d96133b7880141d68b5e01a8783de9d] iommu/hyper-v: Remove I/O-APIC ID check from hyperv_irq_remapping_select()
git bisect bad 26ab12bb9d96133b7880141d68b5e01a8783de9d
# good: [341b4a7211b6ba3a7089e1dc09ac4bd576dfb05f] x86/ioapic: Cleanup IO/APIC route entry structs
git bisect good 341b4a7211b6ba3a7089e1dc09ac4bd576dfb05f

Comment by Mthw (jari_45) - Tuesday, 02 March 2021, 05:45 GMT
Build 5.10.0-rc1-1-loqs-00031-g79eb3581bcaa is OK.
Comment by loqs (loqs) - Tuesday, 02 March 2021, 06:13 GMT
git bisect good
Bisecting: 5 revisions left to test after this (roughly 3 steps)
[d981059e13ffa9ed03a73472e932d070323bd057] x86/hyperv: Enable 15-bit APIC ID if the hypervisor supports it

https://drive.google.com/file/d/1QNmoOmXq6_MtrCJ15dld3GUHbDwrXmQt/view?usp=sharing linux-loqs-5.10rc1.r36.gd981059e13ff-1-x86_64.pkg.tar.zst
Comment by Mthw (jari_45) - Tuesday, 02 March 2021, 06:27 GMT
Build 5.10.0-rc1-1-loqs-00036-gd981059e13ff is also good.
Comment by loqs (loqs) - Tuesday, 02 March 2021, 06:45 GMT
git bisect good
Bisecting: 2 revisions left to test after this (roughly 2 steps)
[2fb6acf3edfeb904505f9ba3fd01166866062591] iommu/amd: Fix union of bitfields in intcapxt support

https://drive.google.com/file/d/1KnowvzfXwK6tWzMJpu-LuIYe5s7lSwQj/view?usp=sharing linux-loqs-5.10rc1.r39.g2fb6acf3edfe-1-x86_64.pkg.tar.zst
Comment by Mthw (jari_45) - Tuesday, 02 March 2021, 06:52 GMT
This one is bad.
Comment by loqs (loqs) - Tuesday, 02 March 2021, 07:08 GMT
git bisect bad
Bisecting: 0 revisions left to test after this (roughly 1 step)
[aec8da04e4d71afdd4ab3025ea34a6517435f363] x86/ioapic: Correct the PCI/ISA trigger type selection

https://drive.google.com/file/d/1eA1pXjHec0OvZT9hBV8H19DpUv55VNoC/view?usp=sharing linux-loqs-5.10rc1.r38.gaec8da04e4d7-1-x86_64.pkg.tar.zst
Comment by Mthw (jari_45) - Tuesday, 02 March 2021, 07:38 GMT
This one is also bad.
Comment by loqs (loqs) - Tuesday, 02 March 2021, 08:02 GMT
git bisect bad
Bisecting: 0 revisions left to test after this (roughly 0 steps)
[f36a74b9345aebaf5d325380df87a54720229d18] x86/ioapic: Use I/O-APIC ID for finding irqdomain, not index

https://drive.google.com/file/d/1BVeJO5B5xXAQ_aygQ6JKSLe_WfIAkS_5/view?usp=sharing linux-loqs-5.10rc1.r37.gf36a74b9345a-1-x86_64.pkg.tar.zst
Comment by Mthw (jari_45) - Tuesday, 02 March 2021, 08:13 GMT
This one is bad too. The last one working correctly is r36, I just re-checked that one too.
Comment by loqs (loqs) - Tuesday, 02 March 2021, 08:23 GMT
git bisect bad
f36a74b9345aebaf5d325380df87a54720229d18 is the first bad commit
commit f36a74b9345aebaf5d325380df87a54720229d18
Author: David Woodhouse <dwmw@amazon.co.uk>
Date: Tue Nov 3 16:36:22 2020 +0000

x86/ioapic: Use I/O-APIC ID for finding irqdomain, not index

In commit b643128b917 ("x86/ioapic: Use irq_find_matching_fwspec() to
find remapping irqdomain") the I/O-APIC code was changed to find its
parent irqdomain using irq_find_matching_fwspec(), but the key used
for the lookup was wrong. It shouldn't use 'ioapic' which is the index
into its own ioapics[] array. It should use the actual arbitration
ID of the I/O-APIC in question, which is mpc_ioapic_id(ioapic).

Fixes: b643128b917 ("x86/ioapic: Use irq_find_matching_fwspec() to find remapping irqdomain")
Reported-by: lkp <oliver.sang@intel.com>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link:57adf2c305cd0c5e9d860b2f3007a7e676fd0f9f.camel@infradead.org"> https://lore.kernel.org/r/57adf2c305cd0c5e9d860b2f3007a7e676fd0f9f.camel@infradead.org

arch/x86/kernel/apic/io_apic.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

git bisect log
git bisect start
# bad: [f40ddce88593482919761f74910f42f4b84c004b] Linux 5.11
git bisect bad f40ddce88593482919761f74910f42f4b84c004b
# good: [2c85ebc57b3e1817b6ce1a6b703928e113a90442] Linux 5.10
git bisect good 2c85ebc57b3e1817b6ce1a6b703928e113a90442
# bad: [538fcf57aaee6ad78a05f52b69a99baa22b33418] Merge branches 'acpi-scan', 'acpi-pnp' and 'acpi-sleep'
git bisect bad 538fcf57aaee6ad78a05f52b69a99baa22b33418
# bad: [d635a69dd4981cc51f90293f5f64268620ed1565] Merge tag 'net-next-5.11' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next
git bisect bad d635a69dd4981cc51f90293f5f64268620ed1565
# good: [a1dd1d86973182458da7798a95f26cfcbea599b4] Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
git bisect good a1dd1d86973182458da7798a95f26cfcbea599b4
# good: [e5795aacd71b697c739f2d193b0e275993d93187] Merge tag 'wireless-drivers-next-2020-12-12' of git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers-next
git bisect good e5795aacd71b697c739f2d193b0e275993d93187
# good: [dfefd226b0bf7c435a58d75a0ce2f9273b9825f6] mm: cleanup kstrto*() usage
git bisect good dfefd226b0bf7c435a58d75a0ce2f9273b9825f6
# good: [eb0ea74120e0f14a6d6454109153d1b4ccf210fc] Merge tag 'x86-fpu-2020-12-14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
git bisect good eb0ea74120e0f14a6d6454109153d1b4ccf210fc
# good: [22f07b86d4e580424cbeb0ce232ed30d4b5ecb95] Merge branch 'bnxt_en-improve-firmware-flashing'
git bisect good 22f07b86d4e580424cbeb0ce232ed30d4b5ecb95
# bad: [26ab12bb9d96133b7880141d68b5e01a8783de9d] iommu/hyper-v: Remove I/O-APIC ID check from hyperv_irq_remapping_select()
git bisect bad 26ab12bb9d96133b7880141d68b5e01a8783de9d
# good: [341b4a7211b6ba3a7089e1dc09ac4bd576dfb05f] x86/ioapic: Cleanup IO/APIC route entry structs
git bisect good 341b4a7211b6ba3a7089e1dc09ac4bd576dfb05f
# good: [79eb3581bcaae9b5677629d945e14da212aa76e2] iommu/vt-d: Simplify intel_irq_remapping_select()
git bisect good 79eb3581bcaae9b5677629d945e14da212aa76e2
# good: [d981059e13ffa9ed03a73472e932d070323bd057] x86/hyperv: Enable 15-bit APIC ID if the hypervisor supports it
git bisect good d981059e13ffa9ed03a73472e932d070323bd057
# bad: [2fb6acf3edfeb904505f9ba3fd01166866062591] iommu/amd: Fix union of bitfields in intcapxt support
git bisect bad 2fb6acf3edfeb904505f9ba3fd01166866062591
# bad: [aec8da04e4d71afdd4ab3025ea34a6517435f363] x86/ioapic: Correct the PCI/ISA trigger type selection
git bisect bad aec8da04e4d71afdd4ab3025ea34a6517435f363
# bad: [f36a74b9345aebaf5d325380df87a54720229d18] x86/ioapic: Use I/O-APIC ID for finding irqdomain, not index
git bisect bad f36a74b9345aebaf5d325380df87a54720229d18
# first bad commit: [f36a74b9345aebaf5d325380df87a54720229d18] x86/ioapic: Use I/O-APIC ID for finding irqdomain, not index

Open a bug on https://bugzilla.kernel.org or contact the author of the bad commit.
Comment by Mthw (jari_45) - Tuesday, 02 March 2021, 08:53 GMT
I created a report at https://bugzilla.kernel.org/show_bug.cgi?id=212017. Hopefully someone looks at it. Thank you for your help.
Comment by loqs (loqs) - Tuesday, 02 March 2021, 21:58 GMT
https://drive.google.com/file/d/1f1L4_EEP5h3-hHdEZZS7giqB8WFtKyrU/view?usp=sharing linux-loqs-5.11-3-x86_64.pkg.tar.zst
This reverts f36a74b9345aebaf5d325380df87a54720229d18
Comment by Mthw (jari_45) - Wednesday, 03 March 2021, 08:10 GMT
And it works. Do you think it's worth the try to contact maintainers of the linux package, and ask If they would be willing to include this change?
Comment by loqs (loqs) - Wednesday, 03 March 2021, 13:01 GMT
I do not think the maintainers will take the revert without any input from upstream. Try emailing David Woodhouse <dwmw@amazon.co.uk> the author of the bad commit.
Comment by Mthw (jari_45) - Wednesday, 03 March 2021, 13:49 GMT
I already did.
Comment by Jordi (jordicoma) - Wednesday, 03 March 2021, 18:48 GMT
I just tested the last kernel provided here, and it works. Good work finding the problem.
If you need some logs (I don't know if it could help on something), I'll will help.
Hope it's fixed in the main branch.
Comment by Jordi (jordicoma) - Wednesday, 03 March 2021, 20:04 GMT
I just tested the last kernel provided here, and it works. Good work finding the problem.
If you need some logs (I don't know if it could help on something), I'll will help.
Hope it's fixed in the main branch.
Comment by David Woodhouse (dwmw2) - Monday, 15 March 2021, 11:53 GMT Comment by Mthw (jari_45) - Monday, 15 March 2021, 12:13 GMT
@loqs Can you please build it, so I can test it? Thanks.
Comment by Mthw (jari_45) - Monday, 15 March 2021, 19:38 GMT
No need anymore, I built it myself and it works correctly.
Comment by loqs (loqs) - Monday, 15 March 2021, 20:49 GMT
https://drive.google.com/file/d/1SRvnQjuFJsSdnV0JBzBW1osvjFFTOO7W/view?usp=sharing linux-loqs-5.11-4-x86_64.pkg.tar.zst
Applies https://lore.kernel.org/lkml/20210315111502.440451-1-dwmw2%40infradead.org/

Thank you very much for the patch David.
If you have time perhaps you could look into what looks to be another AMD IOMMU bug in 5.11  FS#69757  / https://bugzilla.kernel.org/show_bug.cgi?id=212069
Perhaps it is a duplicate but the bisection found a different commit to be the cause.
Comment by Mthw (jari_45) - Tuesday, 16 March 2021, 15:49 GMT
The build above also works.
Comment by mattia (nTia89) - Friday, 11 March 2022, 15:54 GMT
I cannot reproduce the issue. Is it still valid for you?
Comment by Mthw (jari_45) - Friday, 11 March 2022, 16:15 GMT
No, this should have been closed a long time ago.

Loading...