Please read this before reporting a bug:
https://wiki.archlinux.org/title/Bug_reporting_guidelines
Do NOT report bugs when a package is just outdated, or it is in the AUR. Use the 'flag out of date' link on the package page, or the Mailing List.
REPEAT: Do NOT report bugs for outdated packages!
https://wiki.archlinux.org/title/Bug_reporting_guidelines
Do NOT report bugs when a package is just outdated, or it is in the AUR. Use the 'flag out of date' link on the package page, or the Mailing List.
REPEAT: Do NOT report bugs for outdated packages!
FS#50785 - [linux] Testing kernel 4.7.3-2 sometimes reboots during resume from successful hibernation (STD)
Attached to Project:
Arch Linux
Opened by Aaron (a3n) - Thursday, 15 September 2016, 06:32 GMT
Last edited by Doug Newgard (Scimmia) - Tuesday, 17 October 2017, 13:28 GMT
Opened by Aaron (a3n) - Thursday, 15 September 2016, 06:32 GMT
Last edited by Doug Newgard (Scimmia) - Tuesday, 17 October 2017, 13:28 GMT
|
DetailsDescription: Occasionally, after resuming (pressing the power on button) after a successful hibernation (STD):
* the resume starts but does not complete, * instead the system reboots while resume is in progress, * and rather than getting my pre-hibernation session, I have to log in from scratch. * Hibernated data is lost. This does NOT involve black screens, inactive mice, hanging, unsuccessful poweroff and immediate reboot, or other inoperability, which is what most searches turn up. The system works fine, just occasionally I'll get a reboot during resume and lose hibernated data. Attached files: (with clumsy analasys in the reproduction section below) * 473FailureLog.txt, journal of boot session with six successful hibernations, and the failed one that reboots. * 473RebootAfterFailureLog.txt, journal of boot session after the spontaneous reboot. * netconsole.txt, the netcat capture file from an external laptop monitoring netconsole output. In the netconsole.txt capture file I noticed this: # grep "PM: Wrote" netconsole.txt [ 1018.058475] PM: Wrote 998548 kbytes in 7.23 seconds (138.11 MB/s) [ 1678.715363] PM: Wrote 1164040 kbytes in 8.33 seconds (139.74 MB/s) [ 2338.250132] PM: Wrote 1151388 kbytes in 7.49 seconds (153.72 MB/s) [ 2999.451110] PM: Wrote 1233736 kbytes in 8.75 seconds (140.99 MB/s) [ 3659.781817] PM: Wrote 1284228 kbytes in 9.47 seconds (135.61 MB/s) [ 4319.373342] PM: Wrote 1270104 kbytes in 9.29 seconds (136.71 MB/s) [ 4981.954589] PM: Wrote 1511712 kbytes in 11.98 seconds (126.18 MB/s) The size of the image written increases (but not every time), and the time it takes to write increases every time. While this was happening I was just reading the web. Don't know if that's a clue or not. I have seen this behavior in: * testing kernel 4.7.3-2 (this report) * core kernel 4.7.2-1 * extra kernel linux-zen 4.7.2-1 It's hard to prove a negative, but I have not seen this behavior in: * core kernel linux-lts 4.4.20-1 I've searched this bug list, Arch forums, kernel.org bug list, and the WWW. The Arch forum has one report of similar behavior, but no solved post and no indication of activity. Most search hits elsewhere are either concluded with "my bad!" or the behavior ends in some kind of inoperability. The only thing happening in my report is reboot during resume, and loss of hibernated data; the system works fine otherwise. Additional info: * Laptop: oldish HP Pavilion G7 * BIOS/MBR * 6G RAM, 12G swap volume * Single 750G disk, single partition * Entire partition is LVM on LUKS (FDE) * 2 LVM volumes: root and swap * /boot is inside the root volume, no separate boot partition # lsblk -f NAME FSTYPE LABEL UUID MOUNTPOINT sda └─sda1 crypto_LUKS 486d0088-c99e-46f7-b4a7-b8b06f68eaff └─VolGrp LVM2_member zKBHeW-fzai-yVzR-fBuR-vkSg-hR3P-lZ58QI ├─VolGrp-swap swap 5f24cce5-8861-4e22-9d63-5f946dbde79f [SWAP] └─VolGrp-root ext4 cede4248-86ab-4a8a-9476-7b9c34e65b6e / sr0 # dmesg [ 0.000000] microcode: microcode updated early to revision 0x29, date = 2013-06-12 [ 0.000000] Linux version 4.7.3-2-ARCH (builduser@tobias) (gcc version 6.2.1 20160830 (GCC) ) #1 SMP PREEMPT Thu Sep 8 09:44:02 CEST 2016 [ 0.000000] Command line: BOOT_IMAGE=/boot/vmlinuz-linux root=/dev/mapper/VolGrp-root rw cryptdevice=/dev/sda1:VolGrp resume=/dev/mapper/VolGrp-swap ignore_loglevel initcall_debug no_console_suspend # lscpu Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian CPU(s): 4 On-line CPU(s) list: 0-3 Thread(s) per core: 2 Core(s) per socket: 2 Socket(s): 1 NUMA node(s): 1 Vendor ID: GenuineIntel CPU family: 6 Model: 42 Model name: Intel(R) Core(TM) i3-2350M CPU @ 2.30GHz Stepping: 7 CPU MHz: 818.560 CPU max MHz: 2300.0000 CPU min MHz: 800.0000 BogoMIPS: 4591.93 Virtualization: VT-x L1d cache: 32K L1i cache: 32K L2 cache: 256K L3 cache: 3072K NUMA node0 CPU(s): 0-3 Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer xsave avx lahf_lm epb tpr_shadow vnmi flexpriority ept vpid xsaveopt dtherm arat pln pts # lsinitcpio -c /boot/initramfs-linux.img |egrep -ve "^#|^$" MODULES="" BINARIES="" FILES="/crypto_keyfile.bin" HOOKS="base udev autodetect modconf block encrypt lvm2 resume filesystems keyboard fsck" # egrep -ve "^#|^$" /etc/default/grub GRUB_ENABLE_CRYPTODISK=y GRUB_DEFAULT=0 GRUB_TIMEOUT=4 GRUB_DISTRIBUTOR="Arch" GRUB_CMDLINE_LINUX_DEFAULT="ignore_loglevel initcall_debug no_console_suspend" GRUB_CMDLINE_LINUX="cryptdevice=/dev/sda1:VolGrp resume=/dev/mapper/VolGrp-swap" GRUB_PRELOAD_MODULES="lvm part_gpt part_msdos" GRUB_TERMINAL_INPUT=console GRUB_GFXMODE=auto GRUB_GFXPAYLOAD_LINUX=keep GRUB_DISABLE_RECOVERY=true Steps to reproduce: Hibernate over and over until you get a reboot during resume. netconsole (serial console over ethernet emulation) is enabled as a module, to capture messages on another laptop. Since it's a module, it won't turn on the network, so it has to be started after the network. I have not tried compiling it in to get earlier messages. # cat /etc/systemd/system/netconsole.service [Unit] Description=kernel kmsg to network Description=netconsole Requires=network.target After=network.target [Service] ExecStart=-/usr/bin/modprobe netconsole [Install] WantedBy=multi-user.target # cat /etc/modprobe.d/netconsole.conf options netconsole netconsole=6665@192.168.5.100/eno1,6666@192.168.5.104/64:31:50:5b:7d:b2 I set up a systemd user timer to hibernate the system every ten minutes, since I don't know how to otherwise trigger this behavior. I enable and start the timer, then wait for the bug to occur. After each hibernate's poweroff, I start the system and input my passphrase. After the bug occurs and I've logged in, I stop and disable the timer, read the logs and think. Thinking hasn't paid off yet. % ls -l ~/.config/systemd/user total 8 -rw-r--r-- 1 aaron aaron 155 Sep 2 15:49 hiboften.service -rw-r--r-- 1 aaron aaron 139 Sep 8 23:56 hiboften.timer % cat ~/.config/systemd/user/hiboften.timer [Unit] Description=hiboften.timer [Timer] OnActiveSec=10min OnUnitActiveSec=10min Unit=hiboften.service [Install] WantedBy=timers.target % cat ~/.config/systemd/user/hiboften.service [Unit] Description=hiboften.service [Service] Type=simple ExecStartPre=/usr/bin/logger "Starting hiboften.service" ExecStart=/usr/bin/systemctl hibernate Besides doing other searches, this file is grep input to show the hibernation cycles and the failure in the system's journal: # cat searchHibFail.grep save_microcode_in_initrd PM: Hibernation image not present or could not be loaded. Starting Reboot... initcall init_netconsole kernel: Command line: BOOT_IMAGE= kernel: Linux version Starting hiboften.service... -- reboot PM: Hibernation mode set to 'platform' PM: Creating hibernation image: PM: image restored successfully : link up Here I search the journal, using the grep input file. I show from just before the last on-purpose commanded reboot, all the hibernations including the bug, and the reboot after the bug. I see nothing clueful. There are six successful hibernations, followed by a failure and reboot. Note that just because you don't see hibernation image creation and restoration in the failed hibernation doesn't prove they didn't happen, they just may not have made it into the log. When I was running the core 4.7.2-1 kernel, these were reported as happening. # journalctl --no-pager |egrep -if searchHibFail.grep ... Sep 14 17:20:16 thing1 systemd[1]: Starting Reboot... -- Reboot -- Sep 14 17:22:16 thing1 kernel: Linux version 4.7.3-2-ARCH (builduser@tobias) (gcc version 6.2.1 20160830 (GCC) ) #1 SMP PREEMPT Thu Sep 8 09:44:02 CEST 2016 Sep 14 17:22:16 thing1 kernel: Command line: BOOT_IMAGE=/boot/vmlinuz-linux root=/dev/mapper/VolGrp-root rw cryptdevice=/dev/sda1:VolGrp resume=/dev/mapper/VolGrp-swap ignore_loglevel initcall_debug no_console_suspend Sep 14 17:22:16 thing1 kernel: calling save_microcode_in_initrd+0x0/0x36 @ 1 Sep 14 17:22:16 thing1 kernel: initcall save_microcode_in_initrd+0x0/0x36 returned 0 after 0 usecs Sep 14 17:22:17 thing1 kernel: PM: Hibernation image not present or could not be loaded. Sep 14 17:22:17 thing1 kernel: PM: Hibernation image not present or could not be loaded. Sep 14 17:22:32 thing1 kernel: r8169 0000:02:00.0 eno1: link up Sep 14 17:22:38 thing1 kernel: initcall init_netconsole+0x0/0x1000 [netconsole] returned 0 after 30472 usecs Sep 14 17:38:24 thing1 systemd[482]: Starting hiboften.service... Sep 14 17:38:24 thing1 kernel: PM: Hibernation mode set to 'platform' Sep 14 17:40:27 thing1 kernel: PM: Creating hibernation image: Sep 14 17:40:27 thing1 kernel: PM: Image restored successfully. Sep 14 17:40:28 thing1 kernel: r8169 0000:02:00.0 eno1: link up Sep 14 17:51:23 thing1 systemd[482]: Starting hiboften.service... Sep 14 17:51:23 thing1 kernel: PM: Hibernation mode set to 'platform' Sep 14 17:55:36 thing1 kernel: PM: Creating hibernation image: Sep 14 17:55:36 thing1 kernel: PM: Image restored successfully. Sep 14 17:55:37 thing1 kernel: r8169 0000:02:00.0 eno1: link up Sep 14 18:06:33 thing1 systemd[482]: Starting hiboften.service... Sep 14 18:06:33 thing1 kernel: PM: Hibernation mode set to 'platform' Sep 14 18:08:29 thing1 kernel: PM: Creating hibernation image: Sep 14 18:08:29 thing1 kernel: PM: Image restored successfully. Sep 14 18:08:30 thing1 kernel: r8169 0000:02:00.0 eno1: link up Sep 14 18:19:25 thing1 systemd[482]: Starting hiboften.service... Sep 14 18:19:25 thing1 kernel: PM: Hibernation mode set to 'platform' Sep 14 18:21:14 thing1 kernel: PM: Creating hibernation image: Sep 14 18:21:14 thing1 kernel: PM: Image restored successfully. Sep 14 18:21:14 thing1 kernel: r8169 0000:02:00.0 eno1: link up Sep 14 18:32:10 thing1 systemd[482]: Starting hiboften.service... Sep 14 18:32:10 thing1 kernel: PM: Hibernation mode set to 'platform' Sep 14 18:34:32 thing1 kernel: PM: Creating hibernation image: Sep 14 18:34:32 thing1 kernel: PM: Image restored successfully. Sep 14 18:34:33 thing1 kernel: r8169 0000:02:00.0 eno1: link up Sep 14 18:45:28 thing1 systemd[482]: Starting hiboften.service... Sep 14 18:45:28 thing1 kernel: PM: Hibernation mode set to 'platform' Sep 14 18:47:53 thing1 kernel: PM: Creating hibernation image: Sep 14 18:47:53 thing1 kernel: PM: Image restored successfully. Sep 14 18:47:54 thing1 kernel: r8169 0000:02:00.0 eno1: link up Sep 14 18:58:49 thing1 systemd[482]: Starting hiboften.service... Sep 14 18:58:49 thing1 kernel: PM: Hibernation mode set to 'platform' -- Reboot -- Sep 14 19:02:23 thing1 kernel: Linux version 4.7.3-2-ARCH (builduser@tobias) (gcc version 6.2.1 20160830 (GCC) ) #1 SMP PREEMPT Thu Sep 8 09:44:02 CEST 2016 Sep 14 19:02:23 thing1 kernel: Command line: BOOT_IMAGE=/boot/vmlinuz-linux root=/dev/mapper/VolGrp-root rw cryptdevice=/dev/sda1:VolGrp resume=/dev/mapper/VolGrp-swap ignore_loglevel initcall_debug no_console_suspend Sep 14 19:02:23 thing1 kernel: calling save_microcode_in_initrd+0x0/0x36 @ 1 Sep 14 19:02:23 thing1 kernel: initcall save_microcode_in_initrd+0x0/0x36 returned 0 after 0 usecs Sep 14 19:02:23 thing1 kernel: PM: Hibernation image not present or could not be loaded. Sep 14 19:02:23 thing1 kernel: PM: Hibernation image not present or could not be loaded. Sep 14 19:02:39 thing1 kernel: r8169 0000:02:00.0 eno1: link up Sep 14 19:02:46 thing1 kernel: initcall init_netconsole+0x0/0x1000 [netconsole] returned 0 after 28778 usecs On the external laptop I was running the Arch installer .iso, which conveniently has netcat installed. I captured the file, on the external laptop, like so: # nc -u -l -p 6666 |tee netconsole.txt On the external laptop that receives netconsole messages, here is the last successful hibernation, followed by the failed hibernation. Extra debugging messages were filtered out; see the attached netconsole.txt file for full messages. [ 3660.002152] sd 0:0:0:0: [sda] Synchronizing SCSI cache [ 3652.068733] r8169 0000:02:00.0 eno1: link up [ 4307.109118] PM: Hibernation mode set to 'platform' [ 4307.319193] PM: Syncing filesystems ... done. [ 4307.690014] Freezing user space processes ... (elapsed 0.001 seconds) done. [ 4307.694411] Double checking all user space processes after OOM killer disable... (elapsed 0.000 seconds) [ 4307.697563] PM: Marking nosave pages: [mem 0x00000000-0x00000fff] [ 4307.700342] PM: Marking nosave pages: [mem 0x0009d000-0x000fffff] [ 4307.702827] PM: Marking nosave pages: [mem 0xace3f000-0xacffefff] [ 4307.705275] PM: Marking nosave pages: [mem 0xad000000-0xffffffff] [ 4307.710041] PM: Basic memory bitmaps created [ 4307.712551] PM: Preallocating image memory... done (allocated 318017 pages) [ 4308.053875] PM: Allocated 1272068 kbytes in 0.34 seconds (3741.37 MB/s) [ 4308.054992] Freezing remaining freezable tasks ... (elapsed 0.001 seconds) done. [ 4311.814841] r8169 0000:02:00.0 eno1: link up [ 4311.873217] PM: Image saving progress: 60% [ 4312.159023] PM: Image saving progress: 70% [ 4312.494732] PM: Image saving progress: 80% [ 4312.794182] PM: Image saving progress: 90% [ 4313.004730] PM: Image saving progress: 100% [ 4319.370904] PM: Image saving done. [ 4319.373342] PM: Wrote 1270104 kbytes in 9.29 seconds (136.71 MB/s) [ 4319.376336] PM: S| [ 4319.579868] sd 0:0:0:0: [sda] Synchronizing SCSI cache [ 4319.580036] sd 0:0:0:0: [sda] Stopping disk [ 4311.808394] r8169 0000:02:00.0 eno1: link up [ 4967.042073] PM: Hibernation mode set to 'platform' [ 4967.256155] PM: Syncing filesystems ... done. [ 4967.525124] Freezing user space processes ... (elapsed 0.001 seconds) done. [ 4967.529389] Double checking all user space processes after OOM killer disable... (elapsed 0.000 seconds) [ 4967.531627] PM: Marking nosave pages: [mem 0x00000000-0x00000fff] [ 4967.533638] PM: Marking nosave pages: [mem 0x0009d000-0x000fffff] [ 4967.535627] PM: Marking nosave pages: [mem 0xace3f000-0xacffefff] [ 4967.537623] PM: Marking nosave pages: [mem 0xad000000-0xffffffff] [ 4967.541421] PM: Basic memory bitmaps created [ 4967.543314] PM: Preallocating image memory... done (allocated 378172 pages) [ 4967.889844] PM: Allocated 1512688 kbytes in 0.34 seconds (4449.08 MB/s) [ 4967.890974] Freezing remaining freezable tasks ... (elapsed 0.001 seconds) done. [ 4971.522655] r8169 0000:02:00.0 eno1: link up [ 4971.568146] PM: Image saving progress: 40% [ 4971.845575] PM: Image saving progress: 50% [ 4972.205031] PM: Image saving progress: 60% [ 4972.549352] PM: Image saving progress: 70% [ 4972.952281] PM: Image saving progress: 80% [ 4973.361944] PM: Image saving progress: 90% [ 4973.615032] PM: Image saving progress: 100% [ 4981.952255] PM: Image saving done. [ 4981.954589] PM: Wrote 1511712 kbytes in 11.98 seconds (126.18 MB/s) [ 4981.958070] PM: S| [ 4982.136313] sd 0:0:0:0: [sda] Synchronizing SCSI cache [ 4982.136476] sd 0:0:0:0: [sda] Stopping disk That's all I got. |
This task depends upon
Closed by Doug Newgard (Scimmia)
Tuesday, 17 October 2017, 13:28 GMT
Reason for closing: No response
Tuesday, 17 October 2017, 13:28 GMT
Reason for closing: No response
473FailureLog.txt
I also don't think it's related to lvm or luks. I reinstalled just a bare, non-encrypted, non-lvm system, with no X. Just what you get after the vanilla installation.
I then ran the following script from the root (the only) account. I had to turn on network wake in the bios settings.
for i in $(seq 100); do
rtcwake –m disk –s 10
sleep 5
done
After 30 something hibernations and resumes, I got the failed to resume, reboot instead. Unfortunately that laptop is in the shop (failed fan bearings, it's an old laptop) so I don't have the logs.
I got the script from here: https://01.org/blogs/rzhang/2015/best-practice-debug-linux-suspend/hibernate-issues