FS#59300 - [linux-hardened] 4.17.x kernel panic on logout
Attached to Project:
Arch Linux
Opened by tom (archtom) - Wednesday, 11 July 2018, 15:27 GMT
Last edited by Levente Polyak (anthraxx) - Monday, 06 August 2018, 10:54 GMT
Opened by tom (archtom) - Wednesday, 11 July 2018, 15:27 GMT
Last edited by Levente Polyak (anthraxx) - Monday, 06 August 2018, 10:54 GMT
|
Details
Description:
A kernel panic occurs not on every, but al least on every 3rd logout from openbox using openbox --exit. Logout via loginctl terminate-session $XDG_SESSION_ID produces the same error. I don`t know with wich kernel version this started exactly, but tried a few back and all with the same error. Regular kernel in latest version 4.17.5-1 is fine and the error does not occur. I attached a picture as the log does not give any info about this after reboot. Thanks for looking into it in advance. Kind regards |
This task depends upon
To debug this issue, please compile the regular vanilla kernel from git (non hardened as upstream only accepts such reports) with having CONFIG_DEBUG_LIST=y and try to bisect the issue and report the root of the problem to the upstream kernel.
I will gladly help solving this, but I would need a detailed step-by-step manual. Perhaps it`s faster to try yourself before writing the manual. Sorry and thanks for further help to solve this.
Is this procedure unsafe in any way and can everything be deleted safely afterwards? Usually I would try this in my virtualbox, but the error does not occur there. I don`t feel good doing all this on the production machine. This would mean causing a lot of kernel panics on the system with all our data during testing.
I would take the time and I have already learned a lot by overlooking your input, but I don`t want to mess around with the production system. Especially as my knowledge about kernel stuff is not that good. Is there another way?
If not it would be really nice if you could do the debugging. Thanks a lot.
You would need to at least build and produce on the issue on an unpatched kernel with CONFIG_DEBUG_LIST=y before upstream would accept the report.
As a feedback the kernel panic does also occur on shutdown and reboot with the hardened kernel sometimes.
I will try hardened 4.16.16.a-1 tomorrow morning with downgrade command and report back before I for now have to use the "regular" version of the kernel. I will try a possible fixed version of the hardened kernel for sure and gladly report back. Sorry I can not be of more assistance for this issue as I cannot test it in the virtualbox.
Thanks for maintaining the kernel and for all the help.
The other thing is, if you or your ceo like it or not, the BUG aka kernel OOPS is triggered via DEBUG_LIST and BUG_ON_DATA_CORRUPTION and makes the kernel halt... but the linked list corruption itself is quite frankly still there with the regular vanilla kernel as well (which is bad!). You really want to debug and report the cause of the corrupted linked list as something related to your hardware/driver/env definitivly corrupts it. Just closing your eyes won't magically fix the corruption, it could still potentially eat your kittens.
It's understandable that you don't want to toy with a production system, but maybe you can get a comparable environment with as similar hardware components as possible?
I took some time (and overtook the company production system ;)) for a while and tried with the precompiled hardened versions.
It turns out that the bug / bad commit must be somewhere between 4.15.18.a-1 and 4.16.5.a-1. Sorry, this is the best and most detailed I could come up with as there are no other precompiled versions of the 4.16.x series below .5.
I hope it helps in any way and gives you a chance to go after it.
Thanks in advance
So either you can convince your ceo to get an equal testing environment so we can track this down or there is no point.
Using an old kernel makes the system potentially exposed to security issues and using the regular kernel will just avoid the panic/BUG but it will still be corrupted internally.
Thanks for all the help and sorry that I can not contribute more to solve this. Have a great sunday.
Thanks for all the help and for maintaining the kernel, very much appreciated.
thanks a lot for giving feedback
cheers