FS#10381 - [kernel26] mkinitcpio image causes kernel panic

Attached to Project: Arch Linux
Opened by Paul Sadauskas (rando) - Saturday, 10 May 2008, 05:50 GMT
Last edited by Andrea Scarpino (BaSh) - Monday, 15 June 2009, 07:11 GMT
Task Type Bug Report
Category Packages: Extra
Status Closed
Assigned To Thomas Bächler (brain0)
Architecture i686
Severity Medium
Priority Normal
Reported Version 2007.08-2
Due in Version Undecided
Due Date Undecided
Percent Complete 100%
Votes 0
Private No

Details

Description:

I get a kernel panic from the image mkinitcpio builds. If I copy over the img from the install cd, and add a grub entry to boot from it using the same kernel, that one works. I've tweaked several of the mkinitcpio modules/hooks, but I always get the same error message:

:: Loading Initramfs
export: 36: X: bad variable name
Kernel panic - not syncing: Attempted to kill init!

I can't decipher that error message, and the svn server for mkinitcpio mentioned on the wiki page seems to be down, do I can't check the source. I extracted both the cd and my .img files, and did a diff on the init file, but they looked pretty similar. grepping for 'export' didn't reveal anything obvious. The only difference of consequence I saw was the cd's image had a #!/bin/bash, while mine was #!/bin/sh

Additional info:
Tried with both the mkinitcpio available from the 2008.03-3 CD and the latest available core (0.5.18.1-1). The error was different (same "export" and "bad variable name", but a different line(?) number.
The hwd -e output is available here: http://bbs.archlinux.org/viewtopic.php?id=47996

This task depends upon

Closed by  Andrea Scarpino (BaSh)
Monday, 15 June 2009, 07:11 GMT
Reason for closing:  Not a bug
Additional comments about closing:  No responses in 10 months. Please reopen if necessary.
Comment by Thomas Bächler (brain0) - Sunday, 11 May 2008, 08:06 GMT Comment by Paul Sadauskas (rando) - Sunday, 11 May 2008, 17:20 GMT
From my initial poking, it looks like ./init:33 ( http://projects.archlinux.org/?p=mkinitcpio.git;a=blob;f=init;h=370639191d4bc082edefe54d18001f52f4c9f7aa;hb=HEAD#l33 ) is `export "${cmd}=y"`. I suppose if `cmd` is "X", that /could/ cause this error. I'll keep investigating.
Comment by Thomas Bächler (brain0) - Monday, 12 May 2008, 11:45 GMT
This is indeed a line number, but there is no export on line 36. Can you post your kernel commandline?
Comment by Paul Sadauskas (rando) - Monday, 12 May 2008, 14:06 GMT
Whoops, right. I think the original error message when using the mkinitcpio version that was on the CD was 33.

My kernel commandline is pretty uninteresting:

root (hd2,0)
kernel /vmlinuz26 root=/dev/sdc3 ro
initrd /kernel26.img

Comment by Thomas Bächler (brain0) - Monday, 12 May 2008, 16:16 GMT
Then there should be nothing wrong really. Does it literally say 'X: bad variable name'?
Comment by Paul Sadauskas (rando) - Monday, 12 May 2008, 19:30 GMT
Yeah, that's the last three lines I get, verbatim.

Where is that error from, exactly? Is it actually in init line 36? Or am I looking in the wrong place entirely?

Comment by Aaron Griffin (phrakture) - Tuesday, 13 May 2008, 11:25 GMT
More info needed.

What about the fallback image? Are you using grub or lilo? Are you sure /boot is mounted?

I recommend putting "echo"s all over the initcpio init script (located in /usr/lib/initcpio/) to see if you can track down WHERE this error actually occurs. The only export actually used is when parsing the kernel command line...
Comment by Thomas Bächler (brain0) - Tuesday, 13 May 2008, 11:25 GMT
It should be ... I think.

Can you please do me a favor and use an unmodified mkinitcpio 0.5.18.1 or mkinitcpio from git (as long as you tell me which one), reproduce the error and post it here again? The "version from the CD" is not worth too much I'm afraid, as I have no idea which one it is. Your whole mkinitcpio.conf would also be nice to see. And the initramfs file (kernel26.img).
Comment by Paul Sadauskas (rando) - Tuesday, 13 May 2008, 15:16 GMT
Here are the files. This is using mkinitcpio-0.5.18.1-1 from pacman. I'll try the git version, and post the results here in a few minutes.

@Aaron: Can you point me at some instructions to pack the init back up into a .img? I'm using the zcat method from here http://wiki.archlinux.org/index.php/Mkinitcpio#Getting_under_the_hood to unpack it. Thanks

Comment by Paul Sadauskas (rando) - Tuesday, 13 May 2008, 15:24 GMT
Ok, made a new image from git. The exact error message I get is:

...
Uising IPI No-Shortcut mode
Freeing unused kernel memory: 292k freed
:: Loading Initramfs
export: 36: *X: bad variable name
Kernel panic - not syncing: Attempted to kill init!

* Where I put the asterisk above is actually a solid box. I'm assuming its printing some char that my console font can't display.

I'm posting this from my laptop, so I can read the error message exactly. I'll attach the .img in the next comment.
Comment by Aaron Griffin (phrakture) - Tuesday, 13 May 2008, 16:08 GMT
What file format is your grub.conf? Make sure it is unix formatted, not dos formatted.
Comment by Thomas Bächler (brain0) - Tuesday, 13 May 2008, 16:14 GMT
Ha, now we're getting closer. Can you attach your menu.lst as well? This looks like mkinitcpio tries to use some non-printable character as the variable name.
Comment by Paul Sadauskas (rando) - Tuesday, 13 May 2008, 16:29 GMT
So after more poking and echos in init, it looks like CMDLINE is, in fact, "*X", where * is some unprintable character.

@Aaron: I missed your comment about where to find the init, and was attemting to repack the one that I had unpacked. I got it figured out, though, so disregard my earlier question.
Comment by Paul Sadauskas (rando) - Tuesday, 13 May 2008, 16:49 GMT

I haven't modified my menu.lst from install except to remove extraneous comments, and add the one entry to boot off the cd init img. Its a UNIX file.

Also attached is my modified init, with the "*X" being echoed.

===================

% file /boot/grub/menu.lst
/boot/grub/menu.lst: ASCII English text

% cat /boot/grub/menu.lst
# Config file for GRUB - The GNU GRand Unified Bootloader
# /boot/grub/menu.lst

# general configuration:

timeout 5
default 0
color light-blue/black light-cyan/blue

# (0) Arch Linux
title Arch Linux
root (hd2,0)
kernel /vmlinuz26 root=/dev/sdc3 ro
initrd /kernel26.img

# (1) Arch Linux
title Arch Linux Fallback
root (hd2,0)
kernel /vmlinuz26 root=/dev/sdc3 ro
initrd /kernel26-fallback.img

# (1) Arch Linux
title Arch Linux Install CD initrd
root (hd2,0)
kernel /vmlinuz26 root=/dev/sdc3 ro
initrd /kernel26.img.cd
   init (4 KiB)
Comment by Thomas Bächler (brain0) - Tuesday, 13 May 2008, 16:55 GMT
Just cat'ing the file doesn't help. I wanted you to attach it so I might get a clue where these unreadable signs came from. Where in CMDLINE is this stuff, in the beginning or at the end?
Comment by Aaron Griffin (phrakture) - Tuesday, 13 May 2008, 17:00 GMT
I'm still voting for CRLF! As Thomas said, please attach the file as is, without modifications.
Comment by Paul Sadauskas (rando) - Tuesday, 13 May 2008, 17:03 GMT
This is it
   menu.lst (0.6 KiB)
Comment by Paul Sadauskas (rando) - Tuesday, 13 May 2008, 17:06 GMT
Its the only thing in cmdline. The output from that init is:

:: Loading Initramfs
mounting
reading cmdline
modprobe
CMDLINE=*X
cmd=*X
export: 41: *X: bad variable name
Kernel panic - not syncing: Attempted to kill init!
Comment by Aaron Griffin (phrakture) - Tuesday, 13 May 2008, 17:09 GMT
Hmmm, I think the web interface may be sanitizing it. Can you post the output of "cat -A menu.lst" ?
Comment by Aaron Griffin (phrakture) - Tuesday, 13 May 2008, 17:11 GMT
Another option:
mv menu.lst menu.borked
cat menu.borked > menu.lst

And try that one
Comment by Paul Sadauskas (rando) - Tuesday, 13 May 2008, 17:15 GMT
% cat -A /boot/grub/menu.lst
# Config file for GRUB - The GNU GRand Unified Bootloader$
# /boot/grub/menu.lst$
$
# general configuration:$
timeout 5$
default 0$
color light-blue/black light-cyan/blue$
$
# (0) Arch Linux$
title Arch Linux$
root (hd2,0)$
kernel /vmlinuz26 root=/dev/sdc3 ro$
initrd /kernel26.img$
$
# (1) Arch Linux$
title Arch Linux Fallback$
root (hd2,0)$
kernel /vmlinuz26 root=/dev/sdc3 ro$
initrd /kernel26-fallback.img$
$
# (1) Arch Linux$
title Arch Linux Install CD initrd$
root (hd2,0)$
kernel /vmlinuz26 root=/dev/sdc3 ro$
initrd /kernel26.img.cd$
$
# (1) Windows$
title Windows$
rootnoverify (hd0,0)$
makeactive$
chainloader +1$


Also attached the output. Additionally:

# cd /boot/grub/
# mv menu.lst menu.bad
# cat menu.bad > menu.lst
# diff menu.bad menu.lst
(no output)
   menu.cat (0.6 KiB)
Comment by Thomas Bächler (brain0) - Tuesday, 13 May 2008, 17:36 GMT
Okay, it's not your menu.lst.

Instead of returning your kernel command line, "read CMDLINE </proc/cmdline" returns complete nonsense. This could be a problem in the kernel or in klibc or dash (the klibc shell). I have no idea where to go from here.
Comment by Paul Sadauskas (rando) - Tuesday, 13 May 2008, 17:36 GMT
I tried commenting out the entire "for cmd in ${CMDLINE}" block, and it makes it past that part. It fails to boot, though, because I assume it can't find the root device.
Comment by Paul Sadauskas (rando) - Tuesday, 13 May 2008, 17:41 GMT
Ok, I left that block commented out, and hardcoded 'export root=/dev/sdc3' into the init. This image works, I can boot.

Sure would be awesome to know what was screwing up, though. `cat /proc/cmdline` returns that same bogus "*X" after I boot. (This is using zsh).

Comment by Paul Sadauskas (rando) - Tuesday, 13 May 2008, 17:49 GMT
Looking at /proc/cmdline in most gives me:

<83>X^C

Comment by Thomas Bächler (brain0) - Wednesday, 14 May 2008, 10:06 GMT
This confuses me. Either grub actually passes that as your commandline, or your kernel is somehow broken. Have you tried simply updating/reinstalling the kernel? Have you tried reinstalling grub to the MBR? Can you look into dmesg what your command line looks like (a line starting with "Command line:")?
Comment by Paul Sadauskas (rando) - Wednesday, 14 May 2008, 14:51 GMT
I've reinstalled grub several times, and I have also reinstalled the kernel a couple times. I'm not at that machine right now, but I'll check dmesg when I get home.
Comment by Aaron Griffin (phrakture) - Wednesday, 14 May 2008, 17:10 GMT
I think it's the procfs being flakey here. If the actual contents of /proc/cmdline is borked, then we have bigger problems. Would you possibly want to run a memtest for me (it's on the RC isos) to make sure you ram is all ok?

If that's not the problem... how about printing out some other /proc fields to make sure it's all ok.
Comment by Paul Sadauskas (rando) - Thursday, 15 May 2008, 00:29 GMT
/proc/cpuinfo, meminfo, modules, diskstats are all what I expect. I've been using the system for a week, and (aside from the booting) all is OK.

I upgraded the RAM in this a few months ago, and ran an overnight memtest when I did, it all checked out. I also haven't had any other issues developing or compiling on it.

My guess would be that it has to have something to do with grub passing bogus stuff to the kernel, but I have no idea how. Maybe I should give lilo a shot, see what happens.
Comment by Glenn Matthys (RedShift) - Tuesday, 17 June 2008, 11:08 GMT
What's the status of this issue? Can we get feedback from the original bug reporter?
Comment by Paul Sadauskas (rando) - Tuesday, 17 June 2008, 15:02 GMT
I finally worked around this as I mentioned above http://bugs.archlinux.org/task/10381#comment28089 . I haven't had much time to investigate further. This evening or tomorrow I'll try upgrading the kernel. If that doesn't fix it, then its either a kernel bug or a grub bug. I'm not at that machine right now, so I don't have specific version numbers, but I've been running gentoo for about a year, through several keyworded kernel versions, and also Ubuntu 8.04 installs and runs cleanly. I suppose I never checked /proc/cmdline in any of those, however, since I never had any reason too. I'll try to track down what versions of the kernel and grub I was using.
Comment by Aaron Griffin (phrakture) - Tuesday, 17 June 2008, 15:14 GMT
Ok, so I have this bug now! Yay. I figured something went all borky when I moved, then remembered this bug. I get '36' as well, which makes me even more confused. It's NOT just a random number.

The weird part is, I downgraded to 2.6.24.4 and it STILL persisted.
Comment by Glenn Matthys (RedShift) - Tuesday, 17 June 2008, 15:25 GMT
phrakture: can you add stuff like echo "foo1", echo "foo2", etc... to certain points in /lib/initcpio/init? So we can pinpoint where that X message is actually coming from?
Comment by Glenn Matthys (RedShift) - Tuesday, 17 June 2008, 15:26 GMT
Or show us how we can reproduce this bug?
Comment by Aaron Griffin (phrakture) - Tuesday, 17 June 2008, 15:29 GMT
I have no idea what is causing it.
When I get home I'm going to add a super early break and then do all the init steps by hand, to see if I can figure this out.
Comment by Aaron Griffin (phrakture) - Tuesday, 17 June 2008, 15:42 GMT
Paul, if you get this before I do: could you possibly try switching the following:

read CMDLINE </proc/cmdline
to
CMDLINE="$(/bin/cat /proc/cmdline)"

This was suggested by Dan and may actually be worth a shot.
Comment by Paul Sadauskas (rando) - Tuesday, 17 June 2008, 16:31 GMT
Aaron: When I type `cat /proc/cmdline` once I've booted (or `less`, etc...) I get that same unprintable character.

I'll summarize what I know where, maybe it'll help you figure out where to go next.

The number that's printed with the error is the line number of the init script where the error occurs. For me, it was the "done" at the end for the "for cmd in ${CMDLINE}" loop. I think that's because dash (or anything else) can loop over the invalid chars in /proc/cmdline.

It either has to be the kernel screwing up /proc/cmdline, or grub munging it up when passing it into the kernel. The kernel on the install CD (2008-03) works, but I'm not sure if that uses grub or something else for booting.
Comment by Aaron Griffin (phrakture) - Tuesday, 17 June 2008, 17:00 GMT
That's exactly where I am right now. The kernel and grub from the 2008.04 CD work fine, but I can not get my actual system to boot. I need a good sit-down at home tonight to figure this out. I got real frustrated yesterday because my internet broke at the same time 8)
Comment by Roman Kyrylych (Romashka) - Tuesday, 21 October 2008, 11:43 GMT
status on this?
Comment by Glenn Matthys (RedShift) - Thursday, 11 December 2008, 07:24 GMT
What's the status of this issue?

Loading...