Please read this before reporting a bug:
https://wiki.archlinux.org/title/Bug_reporting_guidelines
Do NOT report bugs when a package is just outdated, or it is in the AUR. Use the 'flag out of date' link on the package page, or the Mailing List.
REPEAT: Do NOT report bugs for outdated packages!
https://wiki.archlinux.org/title/Bug_reporting_guidelines
Do NOT report bugs when a package is just outdated, or it is in the AUR. Use the 'flag out of date' link on the package page, or the Mailing List.
REPEAT: Do NOT report bugs for outdated packages!
FS#18682 - Illegal instruction in glibc's code
Attached to Project:
Arch Linux
Opened by Gilles Bedel (gillux) - Sunday, 14 March 2010, 21:33 GMT
Last edited by Allan McRae (Allan) - Monday, 15 March 2010, 21:48 GMT
Opened by Gilles Bedel (gillux) - Sunday, 14 March 2010, 21:33 GMT
Last edited by Allan McRae (Allan) - Monday, 15 March 2010, 21:48 GMT
|
DetailsDescription:
I'm experiencing some random "Illegal instruction" crashes. It's random because I'm still unable to properly reproduce them, and they vary from one time to another, but sometimes the same context come back. I've been able to generate some coredumps that shows the exact same error occuring for several apps. This includes : gcc (when trying to compile gcc, or sometimes mencoder), rtorrent, mlnet (from mldonkey) and bash. I guess it also affects the kernel because I got many kernel panics, from which I can only get the end of the stack trace displayed on the screen. The things I can see from it vary (from what I remember) : there were page_fault(), other stuff related to tcp, and other things. For all the coredumps I get (11), the illegal instruction _always_ occurs like it's showed in the attached file (typical_crash). It's within the libc _IO_vfscanf_internal function, called from somewhere I don't know (because I'm unable to compile anything, I can't have the debug symbols). And the insctruction pointer is _IO_vfscanf_internal+304. Additional info: Remember I'm unable to compile anything "big" (such as gcc or mencoder) because it always end up somewhere with an Illegal instruction error (either bash or gcc crashing). Smaller programs are OK. These crashes are not easily reproductibles. For example, I sure it will crash when compiling gcc, but I don't know exacty when. And when I try to reproduce the faulty gcc command after, it just compiles the file without any error. I've done 7 memtest passes without any error. When the illegal instruction happens, the insctruction pointer doesn't seems to be aligned with the assembler code. So my guess is that the problem may comes from some glibc alignements mismatchs on x86_64. All the programs used comes from the binary archlinux packages. Mentionned packages versions: * core/glibc 2.11.1-1 * community/rtorrent 0.8.6-2 * core/bash 4.1.002-2 * core/gcc 4.4.3-1 * core/gcc-libs 4.4.3-1 * core/kernel26 2.6.32.7-1 |
This task depends upon
typical_crash
I forget to mention that the CPU temperatures are also fine: 40° on heavy load, 29° idle.
Another thing, I recently discovered that after having several program crashes without a kernel panic, more and more processes begin to hang in uninterruptiple sleep (D state in ps). But since wchan is not available (#17756) I can't see what's happening for those. And in the end I can't even reboot properly because rc.shutdown also become stuck in D state...
I am doubting it is a code bug from the toolchain given the lack of consistency and that you can continue the build after the crash. Hard-drive issues? Give RAM issues are ruled out, perhaps try building in a RAM tmpfs and see if you get the same error.
Do have a recollection about when these errors started occurring and what you updated around that time? Are all your core packages stock Arch versions?
I didn't know that memtest cannot detect all the errors sometimes. And indeed, this bug is happening since a "major upgrade", which include a classic pacman -Syu, but also 2 new freshly bought RAM modules. Before I had 1x 512MB, and now I added 2x 1GB modules and all the motherboard slots are used. Of course, RAM clocks are the same for the 3 modules. I try to remove the 512MB one.
As Jan de Groot experienced, my motherboard didn't like the fact that I had memory slots filled. Despite that each memory module worked fine independently. Strange...
Thank you again for all your support, you've been very helpful :)