FS#36349 - [grub] grub-install fails when /boot is on a software RAID1 LVM volume

Attached to Project: Arch Linux
Opened by Bogdan Szczurek (thebodzio) - Wednesday, 31 July 2013, 21:07 GMT
Last edited by Tobias Powalowski (tpowa) - Wednesday, 07 August 2013, 07:23 GMT
Task Type Bug Report
Category Packages: Core
Status Closed
Assigned To Tobias Powalowski (tpowa)
Ronald van Haren (pressh)
Architecture All
Severity High
Priority Normal
Reported Version
Due in Version Undecided
Due Date Undecided
Percent Complete 100%
Votes 0
Private No

Details

Description:

grub-install fails, if “/boot” is on a software RAID1 LVM volume, with the following message:

Path '/boot/grub' is not readable by GRUB on boot. Installation is impossible. Aborting.

Digging a little bit deeper (grub-install --debug) reveals:

+ is_path_readable_by_grub /boot/grub
+ path=/boot/grub
+ test -e /boot/grub
+ :
+ /usr/bin/grub-probe -t fs /boot/grub
+ return 1
+ gettext_printf 'Path `%s'\'' is not readable by GRUB on boot. Installation is impossible. Aborting.\n' /boot/grub
+ gettext_printf_format='Path `%s'\'' is not readable by GRUB on boot. Installation is impossible. Aborting.\n'
+ shift
++ gettext 'Path `%s'\'' is not readable by GRUB on boot. Installation is impossible. Aborting.\n'
+ printf 'Path `%s'\'' is not readable by GRUB on boot. Installation is impossible. Aborting.\n' /boot/grub
Path `/boot/grub' is not readable by GRUB on boot. Installation is impossible. Aborting.
+ exit 1

“+ /usr/bin/grub-probe -t fs /boot/grub” part is particularly interesting since it causes the whole process to fail. Running “grub-probe -t fs /boot/grub” with additional “-v” flag gives:

grub-probe: info: cannot open `//boot/grub/device.map': No such file or directory.
grub-probe: info: changing current directory to /dev/mapper.
grub-probe: info: /dev/dm-1 is an LVM.
mdadm: cannot open /dev/md0 : No such file or directory
grub-probe: error: cannot open `/dev/md0 ': No such file or directory.

The last 2 lines seem to be an explaination. It appears that “mdadm” called via pipe from “grub-probe” is given not “/dev/md0” but “/dev/md0 ” (notice the trailing spaces). Calling “mdadm” directly via CLI with the latter parameter confirms the previous behaviour:

$ sudo mdadm '/dev/md0 '
mdadm: cannot open /dev/md0 : No such file or directory

while:

$sudo mdadm '/dev/md0'
/dev/md0: 19.98GiB raid1 2 devices, 0 spares. Use mdadm --detail for more detail.

Unfortunately that's all the information I can provide right now :{. I'd rather have some patch ready, but… uh… well… time… ;}.

I suspect it to be an upstream bug, but I wanted a second opinion before pushing it any further.

Additional info:
* package version(s)

Problem exists with the latest grub build: 2.00.5043-2, but the same happened with the previous “grub” – the last one consisting of “grub2-common” and “grub2-bios” packages.

* config and/or log files etc.

Problem exists on both i686 and x86_64 systems – tested on 2 separate setups. Both setups are BIOS based, each of them utilizing 2 GPT partitioned disks (/dev/sda and /dev/sdb), with 2 MiB partitions of EF02 type at the very beginning of the disks, like so (one setup's partition layout given for brevity):

Number Start (sector) End (sector) Size Code Name
1 2048 6143 2.0 MiB EF02 BIOS boot partition
2 6144 41949183 20.0 GiB 8E00 Linux LVM
3 41949184 156299341 54.5 GiB 8E00 Linux LVM

2nd and 3rd partitions of each disk are configured as parts of a software RAID1 array (cat /proc/mdstat):

Personalities : [raid1]
md1 : active raid1 sdb3[2] sda3[0]
57142208 blocks super 1.2 [2/2] [UU]

md0 : active raid1 sdb2[2] sda2[0]
20955008 blocks super 1.2 [2/2] [UU]

Each array contains a physical volume hosting separate volume group (md0 – VG “system”, md1 – VG “data”). Volume group “system” contains logical volume “boot”. Each LV uses the XFS file system.

In short: BIOS system, 2 HDs, GPT with protective MBR, 2 MiB EF02 at each drive + 2 raid1 arrays (md0, md1) with LVM volumes using XFS.

Arch's BBS seems to contain desription of a similar problem https://bbs.archlinux.org/viewtopic.php?id=146521, however in that case there was no RAID1 and LVM – just plain partitions with FSs on them.

Steps to reproduce:

Issue the “grub-install --recheck --debug /dev/sda” on a system where “/boot” is on an LVM volume placed on a software RAID1 array. “--recheck” and “--debug” are optional and don't change the results.
This task depends upon

Closed by  Tobias Powalowski (tpowa)
Wednesday, 07 August 2013, 07:23 GMT
Reason for closing:  Fixed
Comment by Tobias Powalowski (tpowa) - Monday, 05 August 2013, 08:40 GMT
I cannot reproduce this bug here.
grub-install \
--directory="/usr/lib/grub/i386-pc" \
--target="i386-pc" \
--boot-directory="/boot" \
--recheck \
--debug \
"/dev/sda"

reports allways success here.

Ah sorry I don't have a lvm only raid1.
Comment by Bogdan Szczurek (thebodzio) - Monday, 05 August 2013, 10:02 GMT
Exactly! RAID + LVM seems to be showstopper…
Comment by Michael Chang (seally_1186) - Monday, 05 August 2013, 10:37 GMT
I think I found the space: (got command from grub source)
$ sudo vgs --options pv_name --noheadings <vgname> | cat -A
/dev/sda2 $
(Note there are two spaces at the beginning that don't show up in the comment. These are handled fine.)

As you can see, I'm not using RAID + LVM, but apparently GRUB isn't handling that extra space at the end.

The question is whether this is a GRUB problem or an LVM problem.

EDIT: Looks like LVM is just trying to align the columns. Curious why this was never an issue before.

Not sure if this is the right hack/fix, but seems like something like this should fix GRUB (--rows prevents alignment). (WARNING: Completely untested. ...Sorry.)
Comment by Bogdan Szczurek (thebodzio) - Monday, 05 August 2013, 11:21 GMT
A good question it is :}

In the meantime, I took your snippet and run it on my systems. Results are very interesting. On the first system (A) the output for the “system” VG was:

“ /dev/md127$”

on the second (B):

“ /dev/md0 $”

So, it seems that “lvm” is allocating the spaces “just in case”.

EDIT: Nope, it's an attempt to align columns, just like you've said.

What is more interesting, on the system A “grub-probe -t fs -v /boot/grub” returned “xfs” as it should, and “grub-install” succedded, but with a couple of warnings:

sudo grub-install /dev/sda
File descriptor 4 (pipe:[177047]) leaked on vgs invocation. Parent PID 2189: /usr/bin/grub-probe
File descriptor 5 (pipe:[177445]) leaked on vgs invocation. Parent PID 2189: /usr/bin/grub-probe
/usr/bin/grub-probe: warning: Couldn't find physical volume `(null)'. Some modules may be missing from core image..
File descriptor 4 (pipe:[178264]) leaked on vgs invocation. Parent PID 2203: /usr/bin/grub-probe
File descriptor 5 (pipe:[177498]) leaked on vgs invocation. Parent PID 2203: /usr/bin/grub-probe
File descriptor 4 (pipe:[178379]) leaked on vgs invocation. Parent PID 2227: /usr/bin/grub-probe
File descriptor 5 (pipe:[177608]) leaked on vgs invocation. Parent PID 2227: /usr/bin/grub-probe
/usr/bin/grub-probe: warning: Couldn't find physical volume `(null)'. Some modules may be missing from core image..
File descriptor 4 (pipe:[178518]) leaked on vgs invocation. Parent PID 2238: /usr/bin/grub-probe
File descriptor 5 (pipe:[177667]) leaked on vgs invocation. Parent PID 2238: /usr/bin/grub-probe
Installation finished. No error reported.

On the system B, the situation remained as reported in the bug description.

To sum things up, I lean towards partially blaming “lvm” for the situation (RAID indeed seems irrelevant here). However extra spaces could be easily trimmed at grub's side, so…
Comment by Tobias Powalowski (tpowa) - Monday, 05 August 2013, 18:34 GMT
Please try 2.00.5086 from testing, it should solve your issues.
Comment by Bogdan Szczurek (thebodzio) - Monday, 05 August 2013, 21:52 GMT
Results of running “grub-install” from 2.00.5086 are:

$ sudo grub-install --recheck /dev/sdb
File descriptor 4 (pipe:[141703]) leaked on vgs invocation. Parent PID 2129: /usr/bin/grub-probe
File descriptor 5 (pipe:[141697]) leaked on vgs invocation. Parent PID 2129: /usr/bin/grub-probe
File descriptor 4 (pipe:[141768]) leaked on vgs invocation. Parent PID 2142: /usr/bin/grub-probe
File descriptor 5 (pipe:[141762]) leaked on vgs invocation. Parent PID 2142: /usr/bin/grub-probe
File descriptor 4 (pipe:[141892]) leaked on vgs invocation. Parent PID 2166: /usr/bin/grub-probe
File descriptor 5 (pipe:[141886]) leaked on vgs invocation. Parent PID 2166: /usr/bin/grub-probe
File descriptor 4 (pipe:[141956]) leaked on vgs invocation. Parent PID 2177: /usr/bin/grub-probe
File descriptor 5 (pipe:[141950]) leaked on vgs invocation. Parent PID 2177: /usr/bin/grub-probe
Installation finished. No error reported.

It appears to work right now (at least “grub-probe” returned “xfs” for mine “/boot/grub” as it supposed to ;}).

About leaked file desciptors: it seems to be harmless (at least for the results of “grub-probe”) and expected “lvm” behaviour (see e.g.: https://bugs.launchpad.net/ubuntu/+source/lvm2/+bug/591823, http://unix.stackexchange.com/questions/4931/leaking-file-descriptors and http://pikachu.3ti.be/pipermail/rear-users/2011-May/000993.html). However these messages suggest there's some flaw in the way “grub-probe” handles “lvm” invocation. I think it's an upstream bug, so I'll report it to grub devs.

I can't reboot the machine right now (will in a couple of hours), to verify if it boots properly. I'll be back with the follow up as soon as possible.

Loading...