FS#54700 - [linux-hardened] Encypted xfs filesystem fails to boot after upgrading to 4.12
Attached to Project:
Community Packages
Opened by K.S. Bhaskar (ksbhaskar) - Tuesday, 04 July 2017, 22:13 GMT
Last edited by Daniel Micay (thestinger) - Tuesday, 11 July 2017, 20:01 GMT
Opened by K.S. Bhaskar (ksbhaskar) - Tuesday, 04 July 2017, 22:13 GMT
Last edited by Daniel Micay (thestinger) - Tuesday, 11 July 2017, 20:01 GMT
|
Details
Description:
In my machines, I have encrypted partitions for /home. So, after booting, I run a script that executes cryptsetup to make the unencrypted partition available under /dev/mapper, e.g., /dev/mapper/home-aes, and then mount an xfs filesystem in /dev/mapper/home-aes as /home. I have done this for years with Ubuntu and Arch, and works just fine. However, after applying a set of patches that included the 4.12 kernel, when I attempt the mount, I get errors such as the following: [ 23.115192] XFS (dm—0): metadata I/O error: block 0x2 ("xfs_trans_read_buf_map") error 5 numblks 1 [ 23.115300] XFS (dm—0): metadata I/O error: block 0x324b002 ("xfs_trans_read_buf_map") error 5 numblks 1 [ 23.115380] XFS (dm-0): metadata I/O error: block 0x6496002 ("xfs_trans_read_buf_map") error 5 numblks 1 [ 23.115459] XFS (dm-0): metadata I/O error: block 0x96e1002 ("xfs_trans_read_buf_map") error 5 numblks 1 [ 23.115468] XFS (dm-0): Corruption of in—memory data detected. Shutting down filesystem [ 23.115471] XFS (dm-0): Please umount the filesystem and rectify the problem(s) Running xfs_repair makes no difference - the problem persists. However the same file system mounts just fine on Ubuntu 17.04 (my laptops dual boot). Additional info: * package version(s) * config and/or log files etc. Steps to reproduce: Boot and run the following commands: cryptsetup -c aes -s 256 create home-aes /dev/nvme0n1p5 mount -t xfs -o discard,noatime /dev/mapper/home-aes /home |
This task depends upon
Closed by Daniel Micay (thestinger)
Tuesday, 11 July 2017, 20:01 GMT
Reason for closing: Upstream
Additional comments about closing: This has been narrowed down as an upstream issue reproducible with CONFIG_SLUB_DEBUG_ON=y or passing the equivalent slub_debug=FZPU on the kernel line.
Since it's neither specific to linux-hardened or Arch Linux, it needs to be reported and fixed upstream. I'd be willing to backport an upstream fix for it but otherwise it's not something that I'll be working on.
Can let me know when there's an upstream fix and I'll apply it.
Tuesday, 11 July 2017, 20:01 GMT
Reason for closing: Upstream
Additional comments about closing: This has been narrowed down as an upstream issue reproducible with CONFIG_SLUB_DEBUG_ON=y or passing the equivalent slub_debug=FZPU on the kernel line.
Since it's neither specific to linux-hardened or Arch Linux, it needs to be reported and fixed upstream. I'd be willing to backport an upstream fix for it but otherwise it's not something that I'll be working on.
Can let me know when there's an upstream fix and I'll apply it.
Further, I have an encrypted ext4 partition, as well as encrypted xfs... Only the encrypted xfs have the failure, all other partitions (non-encrypted) mount fine...
Have just upgraded, another system (on same machine), and problem occurs, as above, with same conditions
4.12.0-1-ARCH
$ cd /tmp/
$ dd if=/dev/zero of=disk.img bs=1M count=256
256+0 records in
256+0 records out
268435456 bytes (268 MB, 256 MiB) copied, 0.056453 s, 4.8 GB/s
$ sudo losetup /dev/loop0 disk.img
$ sudo cryptsetup -y -c aes -s 256 create home-aes /dev/loop0
Enter passphrase:
Verify passphrase:
$ sudo mkfs.xfs /dev/mapper/home-aes
meta-data=/dev/mapper/home-aes isize=512 agcount=4, agsize=16384 blks
= sectsz=512 attr=2, projid32bit=1
= crc=1 finobt=1, sparse=0, rmapbt=0, reflink=0
data = bsize=4096 blocks=65536, imaxpct=25
= sunit=0 swidth=0 blks
naming =version 2 bsize=4096 ascii-ci=0 ftype=1
log =internal log bsize=4096 blocks=855, version=2
= sectsz=512 sunit=0 blks, lazy-count=1
realtime =none extsz=4096 blocks=0, rtextents=0
$ sudo mount -t xfs -o discard,noatime /dev/mapper/home-aes /mnt
$ sudo umount /mnt
$ sudo cryptsetup close home-aes
$ sudo cryptsetup open /dev/loop0 home-aes -c aes -s 256 --type plain
Enter passphrase:
$ sudo mount -t xfs -o discard,noatime /dev/mapper/home-aes /mnt
This is using 4.12 from mainline with arch's 4.11 config plus all default selections on new options on x86_64
Possibly an issue caused by a linux-hardened specific change.
Edit: Can you replicate my test on linux-hardened to test this please?
Edit2: If you can not replicate my test with linux-hardened can you try to replicate with linux 4.12-1 from staging?
My pacman.conf is set for Testing, Core, Extra, Community.
mirrorlist is (mostly) set to Australian servers/mirrors.
Latest kernel on testing is 4.11.9-1, but I have not tried that... (but happy to give it ago, if you want)...
Also, if you want to give me instructions for the staging setup to obtain the 4.12-1 kernel, cool to try that out...
cheers
Method 1:
Add a section to pacman.conf above [testing]
[staging]
Include = /etc/pacman.d/mirrorlist
then do pacman -Syy linux linux-headers (this stops the rest of staging being brought in)
then disable/remove the section for staging run pacman -Syyu to avoid partial update.
Method 2:
Copy an enabled http/htps mirror entry from /etc/pacman.d/mirrorlist replace /$repo/os/$arch with /staging/os/x86_64/ (assuming x84_64)
Download linux and linux-headers and install with pacman -U. (as the packages are installed from local files the signatures will not be checked)
Used method 1. Results standard kernel "upped" to 4.12-1, and after commenting out [staging] and [testing] sections (I have never used testing repos, btw) ran -Syyu and linux-hardened when to 4.12-1b (up from "a")
tested both kernels...
Vanilla type [4.12-1] kernel, sweet as, no problem picking up encrypted xfs or ext4 vols..
Hardened type [4.12-1b] kernel, problem remains, spurious error msgs. as above for the encrypted xfs(s), but no problem with encrypted ext4...
cheers
Try rebuilding the package with CONFIG_SLAB_CANARY=n, CONFIG_SLAB_SANITIZE=n, CONFIG_SLAB_SANITIZE_VERIFY=n, CONFIG_PAGE_SANITIZE=n and CONFIG_PAGE_SANITIZE_VERIFY=n in config.x86_64.
@thestinger can you not replicate the issue locally?
Edit:
CONFIG_PAGE_SANITIZE=y CONFIG_PAGE_SANITIZE_VERIFY=y added back issue still not reproduced
Edit2:
CONFIG_SLAB_CANARY=y added back issue reproduced
SGI XFS with ACLs, security attributes, realtime, no debug enabled
XFS (dm-2): Mounting V5 Filesystem
Ending clean mount
metadata I/O error: block 0x2 ("xfs_trans_read_buf_map") error 5 numblks 1
XFS (dm-2): metadata I/O error: block 0x20002 ("xfs_trans_read_buf_map") error 5 numblks 1
XFS (dm-2): metadata I/O error: block 0x40002 ("xfs_trans_read_buf_map") error 5 numblks 1
XFS (dm-2): metadata I/O error: block 0x60002 ("xfs_trans_read_buf_map") error 5 numblks 1
XFS (dm-2): Error -5 reserving per-AG metadata reserve pool.
XFS (dm-2): xfs_do_force_shutdown(0x8) called from line 1017 of file fs/xfs/xfs_fsops.c. Return address = 0xffffffffc0979280
XFS (dm-2): Corruption of in-memory data detected. Shutting down filesystem
XFS (dm-2): Please umount the filesystem and rectify the problem(s)
Edit3:
An error under here https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/fs/xfs/libxfs/xfs_ag_resv.c?h=v4.12#n232 ?
Other people are capable of doing work too, and here's an opportunity to do that. I'm fairly convinced it's a memory corruption bug, perhaps being caught because something is checking beyond the bounds of an object and a canary is put there instead of it being padding. It's possible but unlikely that it's a bug in the SLAB_CANARY feature. It works with everything else, including ksize(...).
> CONFIG_SLAB_CANARY=y added back issue reproduced
Try with CONFIG_SLAB_CANARY=y, CONFIG_SLAB_SANITIZE=n, CONFIG_SLAB_SANITIZE_VERIFY=n, CONFIG_PAGE_SANITIZE=n and CONFIG_PAGE_SANITIZE_VERIFY=n.
Edit:
Do you think KASAN would be able to detect the out of bounds access or a kprobe inside xfs_ag_resv_init?
Edit:
Rebuilt with CONFIG_XFS_WARN=y set as well first run did not reproduce the issue, second run did reproduce the issue.
No additional xfs related output generated with /proc/sys/kernel/printk set to 7.
Edit2:
After 10 test runs 9 reproduced the issue.
Edit3:
Rebuilt without CONFIG_XFS_WARN=y with CONFIG_XFS_DEBUG=y first run did not reproduce the issue, second run did reproduce the issue.
No additional xfs related output generated with /proc/sys/kernel/printk set to 7. (identical behavior to that noted in Edit)
Triggered the issue no additional output on dmesg.
Should the issue be reproducible with linux 4.12-2 with the command line slub_debug? As I was not able to reproduce the issue that way test sample of one.
Edit:
Reproduced on second run linux 4.12-2 with the command line slub_debug.
So ksbhaskar or paul.zrexx12r can report the issue upstream provided they can replicate my findings.
Edit2:
http://oss.sgi.com/bugzilla/ ( currently appears to have a backend issue )
http://xfs.org/index.php/XFS_FAQ#Q:_Where_can_I_find_documentation_about_XFS.3F ( as above is not functioning try the irc channel linked here )
http://xfs.org/index.php/XFS_FAQ#Q:_What_information_should_I_include_when_reporting_a_problem.3F
Edit3:
"https://bugzilla.kernel.org/buglist.cgi?product=File%20System&component=XFS&resolution=---" (possibly use the kernel bugzilla instead ( quoted as flyspray is not parsing the url correctly ) )