FS#44807 - [linux] Unreliable resume from hibernation

Attached to Project: Arch Linux
Opened by Milan Oravec (migo) - Friday, 01 May 2015, 19:25 GMT
Last edited by Eli Schwartz (eschwartz) - Monday, 02 October 2017, 20:33 GMT
Task Type Bug Report
Category Kernel
Status Closed
Assigned To Tobias Powalowski (tpowa)
Architecture x86_64
Severity Medium
Priority Normal
Reported Version
Due in Version Undecided
Due Date Undecided
Percent Complete 100%
Votes 7
Private No

Details

Description: Resuming from hibrnation is unreliable: compueter hibernates normaly and turns off, after powering on again to reseume hibernation image it hangs on black screen with cursor in left corner (during resume prcocess is cursor blinking and when it should switch to X it stops blinking and comuper hangs). When resuming for first time with no running programs in gnome it resumes OK, but when more memmory is used (running firefox, thunderbird) it hangs as decribed.

This bug is present from kernel version 3.19 and is present in 4.0 too. When I manulay downgrede kernel version to 3.18.3 (latest arch package available) hibernation forks flawless, tested one month daily hibernating and resuming computer.

On my oteher Sony VAIO Z21xn laptop is this bug not present, it hibernates and resumes OK with recent versions of arch kernel package, it is 8 years old HW. This bug is model specific :( how can I debug this bug futher? Adding no_console_suspend do boot parameters is causing reboot at point when it "normaly" hangs.

Compuer: Sony VAIO PRO 11 touch screen model, lspci:

00:00.0 Host bridge: Intel Corporation Haswell-ULT DRAM Controller (rev 09)
00:02.0 VGA compatible controller: Intel Corporation Haswell-ULT Integrated Graphics Controller (rev 09)
00:03.0 Audio device: Intel Corporation Haswell-ULT HD Audio Controller (rev 09)
00:14.0 USB controller: Intel Corporation 8 Series USB xHCI HC (rev 04)
00:16.0 Communication controller: Intel Corporation 8 Series HECI #0 (rev 04)
00:1b.0 Audio device: Intel Corporation 8 Series HD Audio Controller (rev 04)
00:1c.0 PCI bridge: Intel Corporation 8 Series PCI Express Root Port 3 (rev e4)
00:1c.3 PCI bridge: Intel Corporation 8 Series PCI Express Root Port 4 (rev e4)
00:1d.0 USB controller: Intel Corporation 8 Series USB EHCI #1 (rev 04)
00:1f.0 ISA bridge: Intel Corporation 8 Series LPC Controller (rev 04)
00:1f.2 SATA controller: Intel Corporation 8 Series SATA Controller 1 [AHCI mode] (rev 04)
00:1f.3 SMBus: Intel Corporation 8 Series SMBus Controller (rev 04)
01:00.0 Network controller: Intel Corporation Wireless 7260 (rev 6b)



Additional info:
* package version(s) linux 3.19.x, 4.0.1
* config and/or log files etc. Nothing special here, I'll add when asked.


Steps to reproduce:

Hibernate computer (sony vaio pro11) with running programs and resume will faill on kernel versions higher than 3.18.x

This task depends upon

Closed by  Eli Schwartz (eschwartz)
Monday, 02 October 2017, 20:33 GMT
Reason for closing:  Fixed
Additional comments about closing:  fix included in linux 4.8
Comment by Milan Oravec (migo) - Sunday, 07 June 2015, 20:52 GMT
Hi, I'm runing debuging with boot params.: rw resume=/dev/sda6 acpi_sleep=nonvs libata.force=noncq drm.debug=0xe no_console_suspend initcall_debug

What logs would be helpfull?

THANX!
Comment by Chris Magnuson (hourglasssand) - Thursday, 10 September 2015, 02:43 GMT
I have a dell Lattitude E6540 and I am having the same trouble. I can hibernate and unhibernate ok if I don't have anything open but if I open chrome, vim, and a terminal, it will hang as described and will not unhibernate.

4.1.6-1-ARCH #1 SMP PREEMPT Mon Aug 17 08:52:28 CEST 2015 x86_64 GNU/Linux

lspci:

00:00.0 Host bridge: Intel Corporation Xeon E3-1200 v3/4th Gen Core Processor DRAM Controller (rev 06)
00:01.0 PCI bridge: Intel Corporation Xeon E3-1200 v3/4th Gen Core Processor PCI Express x16 Controller (rev 06)
00:02.0 VGA compatible controller: Intel Corporation 4th Gen Core Processor Integrated Graphics Controller (rev 06)
00:03.0 Audio device: Intel Corporation Xeon E3-1200 v3/4th Gen Core Processor HD Audio Controller (rev 06)
00:14.0 USB controller: Intel Corporation 8 Series/C220 Series Chipset Family USB xHCI (rev 04)
00:16.0 Communication controller: Intel Corporation 8 Series/C220 Series Chipset Family MEI Controller #1 (rev 04)
00:16.3 Serial controller: Intel Corporation 8 Series/C220 Series Chipset Family KT Controller (rev 04)
00:19.0 Ethernet controller: Intel Corporation Ethernet Connection I217-LM (rev 04)
00:1a.0 USB controller: Intel Corporation 8 Series/C220 Series Chipset Family USB EHCI #2 (rev 04)
00:1b.0 Audio device: Intel Corporation 8 Series/C220 Series Chipset High Definition Audio Controller (rev 04)
00:1c.0 PCI bridge: Intel Corporation 8 Series/C220 Series Chipset Family PCI Express Root Port #1 (rev d4)
00:1c.2 PCI bridge: Intel Corporation 8 Series/C220 Series Chipset Family PCI Express Root Port #3 (rev d4)
00:1c.4 PCI bridge: Intel Corporation 8 Series/C220 Series Chipset Family PCI Express Root Port #5 (rev d4)
00:1c.5 PCI bridge: Intel Corporation 8 Series/C220 Series Chipset Family PCI Express Root Port #6 (rev d4)
00:1c.6 PCI bridge: Intel Corporation 8 Series/C220 Series Chipset Family PCI Express Root Port #7 (rev d4)
00:1c.7 PCI bridge: Intel Corporation 8 Series/C220 Series Chipset Family PCI Express Root Port #8 (rev d4)
00:1d.0 USB controller: Intel Corporation 8 Series/C220 Series Chipset Family USB EHCI #1 (rev 04)
00:1f.0 ISA bridge: Intel Corporation QM87 Express LPC Controller (rev 04)
00:1f.2 RAID bus controller: Intel Corporation 82801 Mobile SATA Controller [RAID mode] (rev 04)
00:1f.3 SMBus: Intel Corporation 8 Series/C220 Series Chipset Family SMBus Controller (rev 04)
01:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Mars XTX [Radeon HD 8790M] (rev ff)
03:00.0 Network controller: Intel Corporation Centrino Advanced-N 6235 (rev 24)
0e:00.0 SD Host controller: O2 Micro, Inc. SD/MMC Card Reader Controller (rev 01)
Comment by robert r (crobe) - Wednesday, 14 October 2015, 17:18 GMT
I am seeing the same completely unspecific problem here on an Intel Haswell machine with Intel HD 4400 graphics.

Suspend to disk/resume cycles work during short intervals, but resuming after several hours does not work. System is completely locked up, no cursor, no ping, display backlight seems on. Kernel is 4.1.10-2-lts 64 bit on Clevo W840SU with Core i7-4500U.

Suspend to RAM works without problems. Same problem with linux 4.2.3.

Comment by Milan Oravec (migo) - Friday, 16 October 2015, 19:50 GMT
Hi all, I've another notebook with similar problems with hibernation, it is acer aspire nitro v17. I've one question, is all of you using uefi? On older sony that I have and is using bios hibernation works flawlessly.

Have somebody tried 3.18 kernel? This is last functional version for me on these problematic notebooks.
Comment by Milan Oravec (migo) - Saturday, 17 October 2015, 16:32 GMT
nobody?
Comment by robert r (crobe) - Thursday, 22 October 2015, 13:35 GMT
The system did not hibernate with Kernel 3.18 at all for me on an otherwise up to date system, so I tried the LTS 4.1.10-2. With that kernel the system does not resume properly as stated above. Black screen, no ping, no log.

So I upgraded to kernel 4.2.3 and am now experimenting with
1. Disabling the compositor for kwin_x11 (Alt+Shift+F12), as the Intel driver has different problems
2. Unloading the iwlwifi module, as that has been a problem in the past
prior to suspending. It worked flawlessly for 3 cycles now, one being overnight.
Comment by robert r (crobe) - Wednesday, 04 November 2015, 20:49 GMT
Small update: Problem persisted with Kernel 4.2.3, nevermind, problem is there with 4.2.5.
Comment by Milan Oravec (migo) - Monday, 09 November 2015, 18:44 GMT
Update, today I've tested 4.2.5 arch kernel. It survives one hibernation cycle. On second resume it panics and I attach photo of panic screen. I'm hibernating using systemd form gnome environment.
Comment by robert r (crobe) - Sunday, 06 December 2015, 18:10 GMT
I started digging further into this as it won't fix itself.

After echoing 1 > /sys/power/pm_trace something may be written into the RTC as a hint. After resume failure reboot the system immediately and run dmesg to get some output. Note that you may need to set the clock back to reasonable values with date, as NTP won't correct the date when it gets set to the year 2064.

Unfortunately I am out of luck, does it also work for hibernation?

[ 0.836026] Magic number: 0:579:450
[ 0.836035] machinecheck: hash matches
[ 0.836202] rtc_cmos 00:02: setting system clock to 2064-09-26 21:25:28 UTC (2989689928)

Info: https://www.kernel.org/doc/Documentation/power/s2ram.txt
Comment by Florent Thiery (fthiery) - Wednesday, 13 January 2016, 13:15 GMT
Also having this inconsistent behaviour; it tends to break from kernel to kernel upgrades, and seems related to the number of apps being opened when hibernating.

When it fails to resume, i see the following log (/dev/sda2 is my swap partition)

janv. 13 09:30:47 localhost systemd-hibernate-resume[162]: Could not resume from '/dev/sda2' (8:2).
janv. 13 09:30:47 localhost systemd[1]: Started Resume from hibernation using device /dev/sda2.
janv. 13 09:30:47 localhost systemd[1]: Reached target Local File Systems (Pre).
janv. 13 09:30:47 localhost systemd-fsck[164]: fsck.ext4 doesn't exist, not checking file system on /dev/disk/by-uuid/42a163ab-0771-48ac-b296-caf8f549dec0.
janv. 13 09:30:47 localhost systemd[1]: Started File System Check on /dev/disk/by-uuid/42a163ab-0771-48ac-b296-caf8f549dec0.
janv. 13 09:30:47 localhost systemd[1]: Mounting /sysroot...
janv. 13 09:30:47 localhost kernel: usb 3-3: new low-speed USB device number 2 using xhci_hcd
janv. 13 09:30:47 localhost kernel: PM: Starting manual resume from disk
janv. 13 09:30:47 localhost kernel: PM: Hibernation image partition 8:2 present
janv. 13 09:30:47 localhost kernel: PM: Looking for hibernation image.
janv. 13 09:30:47 localhost kernel: PM: Image not found (code -22)
janv. 13 09:30:47 localhost kernel: PM: Hibernation image not present or could not be loaded.

Does the lines above imply that writing the hibernation file to swap failed ? Could the image file be too small ? Could image_size be related ?
$ cat /sys/power/image_size
6677938176

According to the kernel documentation (https://www.kernel.org/doc/Documentation/power/interface.txt), it is "an upper limit of the image size, in bytes [...] set to 2/5 of available RAM by default"; does it mean that if that the actual memory usage (e.g. lots of open tabs in Firefox/chrome) is bigger than this value, the image will be incomplete ?

I have the following HOOKS enabled (and i do run mkinitcpio -p linux):
$ grep HOOKS /etc/mkinitcpio.conf | tail -n 1
HOOKS="base systemd autodetect modconf block filesystems keyboard"

I do have the required resume=disk boot line in /boot/grub/grub.cfg:
/boot/vmlinuz-linux root=UUID=42a163ab-0771-48ac-b296-caf8f549dec0 rw quiet resume=/dev/sda2

My swap partition is bigger than the RAM.

# LANG=C swapon --show --bytes
NAME TYPE SIZE USED PRIO
/dev/sda2 partition 17668501504 0 -1

# LANG=C free -b total used free shared buff/cache available
Mem: 16703655936 5681102848 5604241408 790388736 5418311680 10118524928
Swap: 17668501504 0 17668501504

$ ls /dev/disk/by-uuid/ -l
total 0
lrwxrwxrwx 1 root root 10 13 janv. 09:30 403f654c-925b-4ac3-87c8-dd0fc85d0e9a -> ../../sda2
lrwxrwxrwx 1 root root 10 13 janv. 09:30 42a163ab-0771-48ac-b296-caf8f549dec0 -> ../../sda1

Running Haswell i7-4771 with Intel GPU
Comment by Florent Thiery (fthiery) - Wednesday, 13 January 2016, 13:46 GMT
For what it's worth, i tried to increase /sys/power/image_size (using echo bytes > /sys/power/image_size) to both my RAM size, and my swap partition size, and the problem still happens.
Comment by Florent Thiery (fthiery) - Wednesday, 13 January 2016, 14:27 GMT Comment by robert r (crobe) - Monday, 18 January 2016, 20:18 GMT
Thanks for the Intel guide, that was very interesting.
I got a little bit further using the RTC method on 4.1.15-1-lts:

[ 0.760976] Magic number: 0:528:178
[ 0.760982] hash matches drivers/base/power/main.c:1063
[ 0.761058] acpi device:0e: hash matches
[ 0.761080] platform: hash matches
[ 0.761137] rtc_cmos 00:02: setting system clock to 2004-04-22 12:10:49 UTC (1082635849)

Unfortunately this looks like a basic problem :/

Additionally the system resumes, loads the image, the screen goes black just right before the system should reappear and nothing further happens, so initcall_debug or no_console_suspend are useless.

All tests from pm_test are successful 10 times in a row, so the problem must be somewhere when the system gets switched off.
Comment by Florent Thiery (fthiery) - Friday, 05 February 2016, 09:01 GMT
Just upgraded to 4.4.1, successfully came out of hibernation twice. A little early to say this issue is fixed, but there's hope :)
Comment by Milan Oravec (migo) - Saturday, 06 February 2016, 11:19 GMT
Hi, no luck here. :( Black screen on first resume, running terminal, thunderbird an firefox with 20 open tabs. System upgraded to actual state before.
Comment by robert r (crobe) - Monday, 08 February 2016, 19:08 GMT
No luck here too... First problem was the intel microcode updates:

[ 0.650905] Magic number: 1:392:320
[ 0.651826] misc microcode: hash matches
[ 0.652803] platform microcode: hash matches

After disabling microcode updates in GRUB same roulette as before. Sometimes it works, sometimes the system reboots and sometimes black screen. Tried some configuration early/late KMS, platform/shutdown suspend mode and pm_trace just gives:

[ 0.627711] Magic number: 1:163:177
[ 0.627788] acpi device:0d: hash matches

I guess the bios is FUBAR on this device.
Comment by robert r (crobe) - Wednesday, 10 February 2016, 11:47 GMT
Two additional thoughts: I was running UEFI in CSM with MBR mode before and switched to UEFI,GPT and GRUB now. This did not help though. Additionally it seems like my RAM has a defect according to memtest86, maybe you should check that too.
Comment by Milan Oravec (migo) - Wednesday, 10 February 2016, 21:40 GMT
Hi, I'm on UEFI, GPT and GRUB from beginning. I hope that my RAM is OK, because it is soldered on MB :( and there is no mem slots. Biggest disadvantage of this machine, I'll try memtest only to be sure...

In past I have this situation with not working hibernation somewhere around 3.8 kerenel for what can I remember and was fixed in 3.11 cycle or 3.12 idk. Now it was 3.18 kernel that worked last, but after some updates (cca. 2 moths ago) now gdm (or intel xorg driver, mesa) wont start at all and hangs with black screen on startup.
Comment by Florent Thiery (fthiery) - Wednesday, 17 February 2016, 15:18 GMT
False alert, problem re-appeared for me.
Comment by Milan Oravec (migo) - Tuesday, 15 March 2016, 16:37 GMT
Little update for 4.4.x. I was able to resume properly 7 times With 4.4.2 than reboot occurs on resume. With actual 4.4.5 I was able to correctly resume only 2 times before reboot arrives. Something has changed because computer doesn't hangs on resume but reboots.
Comment by Florent Thiery (fthiery) - Monday, 04 April 2016, 11:27 GMT
Thanks to an insightful comment in the wiki (https://wiki.archlinux.org/index.php/Talk:Power_management/Suspend_and_hibernate), it seems that we can tweak hibernation modes of systemd (man systemd-sleep.conf). I tested the "shutdown" mode (see below), but it failed on the 2nd attempt (1st worked).

$ cat /etc/systemd/sleep.conf
[Sleep]
HibernateMode=shutdown

From https://www.kernel.org/doc/Documentation/power/basic-pm-debugging.txt: "If neither "platform" nor "shutdown" hibernation mode works, you will need to identify what goes wrong."

Anyone else tried to fiddle with these ?
Comment by Florent Thiery (fthiery) - Monday, 04 April 2016, 12:01 GMT
I think that the image generation fails; after suspending manually (with the commands below), when resuming fails, it seems that the hibernation image was not properly generated.

# echo platform > /sys/power/disk
# echo disk > /sys/power/state

After rebooting:

$ journalctl -k -b -1 | grep "PM: " | tail -n 1
avril 04 13:53:43 xxx kernel: PM: Hibernation mode set to 'platform'
$ journalctl -k -b -0 | grep "PM: "
avril 04 13:54:34 localhost kernel: PM: Registered nosave memory: [mem 0x00000000-0x00000fff]
avril 04 13:54:34 localhost kernel: PM: Registered nosave memory: [mem 0x0009d000-0x0009dfff]
avril 04 13:54:34 localhost kernel: PM: Registered nosave memory: [mem 0x0009e000-0x0009ffff]
avril 04 13:54:34 localhost kernel: PM: Registered nosave memory: [mem 0x000a0000-0x000dffff]
avril 04 13:54:34 localhost kernel: PM: Registered nosave memory: [mem 0x000e0000-0x000fffff]
avril 04 13:54:34 localhost kernel: PM: Registered nosave memory: [mem 0xba82c000-0xba832fff]
avril 04 13:54:34 localhost kernel: PM: Registered nosave memory: [mem 0xbac89000-0xbb0ccfff]
avril 04 13:54:34 localhost kernel: PM: Registered nosave memory: [mem 0xd8bc9000-0xd8dd8fff]
avril 04 13:54:34 localhost kernel: PM: Registered nosave memory: [mem 0xd8dd9000-0xd8df8fff]
avril 04 13:54:34 localhost kernel: PM: Registered nosave memory: [mem 0xd8df9000-0xd9321fff]
avril 04 13:54:34 localhost kernel: PM: Registered nosave memory: [mem 0xd9322000-0xd9ffefff]
avril 04 13:54:34 localhost kernel: PM: Registered nosave memory: [mem 0xda000000-0xdaffffff]
avril 04 13:54:34 localhost kernel: PM: Registered nosave memory: [mem 0xdb000000-0xdf1fffff]
avril 04 13:54:34 localhost kernel: PM: Registered nosave memory: [mem 0xdf200000-0xf7ffffff]
avril 04 13:54:34 localhost kernel: PM: Registered nosave memory: [mem 0xf8000000-0xfbffffff]
avril 04 13:54:34 localhost kernel: PM: Registered nosave memory: [mem 0xfc000000-0xfebfffff]
avril 04 13:54:34 localhost kernel: PM: Registered nosave memory: [mem 0xfec00000-0xfec00fff]
avril 04 13:54:34 localhost kernel: PM: Registered nosave memory: [mem 0xfec01000-0xfecfffff]
avril 04 13:54:34 localhost kernel: PM: Registered nosave memory: [mem 0xfed00000-0xfed03fff]
avril 04 13:54:34 localhost kernel: PM: Registered nosave memory: [mem 0xfed04000-0xfed1bfff]
avril 04 13:54:34 localhost kernel: PM: Registered nosave memory: [mem 0xfed1c000-0xfed1ffff]
avril 04 13:54:34 localhost kernel: PM: Registered nosave memory: [mem 0xfed20000-0xfedfffff]
avril 04 13:54:34 localhost kernel: PM: Registered nosave memory: [mem 0xfee00000-0xfee00fff]
avril 04 13:54:34 localhost kernel: PM: Registered nosave memory: [mem 0xfee01000-0xfeffffff]
avril 04 13:54:34 localhost kernel: PM: Registered nosave memory: [mem 0xff000000-0xffffffff]
avril 04 13:54:34 localhost kernel: PM: Registering ACPI NVS region [mem 0xba82c000-0xba832fff] (28672 bytes)
avril 04 13:54:34 localhost kernel: PM: Registering ACPI NVS region [mem 0xd8df9000-0xd9321fff] (5410816 bytes)
avril 04 13:54:34 localhost kernel: PM: Checking hibernation image partition /dev/disk/by-uuid/403f654c-925b-4ac3-87c8-dd0fc85d0e9a
avril 04 13:54:34 localhost kernel: PM: Hibernation image not present or could not be loaded.
avril 04 13:54:35 localhost kernel: PM: Starting manual resume from disk
avril 04 13:54:35 localhost kernel: PM: Hibernation image partition 8:2 present
avril 04 13:54:35 localhost kernel: PM: Looking for hibernation image.
avril 04 13:54:35 localhost kernel: PM: Image not found (code -22)
avril 04 13:54:35 localhost kernel: PM: Hibernation image not present or could not be loaded.

Btw my swap should be big enough, right ?

$ LANG=C free -mh
total used free shared buff/cache available
Mem: 15G 2.9G 11G 391M 1.2G 12G
Swap: 16G 0B 16G
Comment by Florent Thiery (fthiery) - Monday, 04 April 2016, 12:11 GMT
Interestingly, here's what i get when resuming succeeds. The interesting bit is that regardless of the resume success, i have an error "Hibernation image not present or could not be loaded." from localhost (i assume it is before systemd launches, i.e. before it mounts the root filesystem and sets the real hostname); after that, systemd seems to run it's own resume procedure. Apparently, the image generation logs are also shown when resuming, which means that the system may be hibernating too fast to write the log lines. This also means no log if the system failed to resume...

avril 04 13:53:43 my.hostname.net kernel: PM: Hibernation mode set to 'platform'
HIBERNATING
avril 04 13:54:34 localhost kernel: PM: Registered nosave memory: [mem 0x00000000-0x00000fff]
...
avril 04 13:54:34 localhost kernel: PM: Registered nosave memory: [mem 0xff000000-0xffffffff]
avril 04 13:54:34 localhost kernel: PM: Registering ACPI NVS region [mem 0xba82c000-0xba832fff] (28672 bytes)
avril 04 13:54:34 localhost kernel: PM: Registering ACPI NVS region [mem 0xd8df9000-0xd9321fff] (5410816 bytes)
avril 04 13:54:34 localhost kernel: PM: Checking hibernation image partition /dev/disk/by-uuid/403f654c-925b-4ac3-87c8-dd0fc85d0e9a
avril 04 13:54:34 localhost kernel: PM: Hibernation image not present or could not be loaded.
avril 04 13:54:35 localhost kernel: PM: Starting manual resume from disk
avril 04 13:54:35 localhost kernel: PM: Hibernation image partition 8:2 present
avril 04 13:54:35 localhost kernel: PM: Looking for hibernation image.
avril 04 13:54:35 localhost kernel: PM: Image not found (code -22)
avril 04 13:54:35 localhost kernel: PM: Hibernation image not present or could not be loaded.
avril 04 14:02:35 my.hostname.net kernel: PM: Hibernation mode set to 'platform'
avril 04 14:03:17 my.hostname.net kernel: PM: Syncing filesystems ... done.
avril 04 14:03:17 my.hostname.net kernel: PM: Marking nosave pages: [mem 0x00000000-0x00000fff]
avril 04 14:03:17 my.hostname.net kernel: PM: Marking nosave pages: [mem 0x0009d000-0x000fffff]
avril 04 14:03:17 my.hostname.net kernel: PM: Marking nosave pages: [mem 0xba82c000-0xba832fff]
avril 04 14:03:17 my.hostname.net kernel: PM: Marking nosave pages: [mem 0xbac89000-0xbb0ccfff]
avril 04 14:03:17 my.hostname.net kernel: PM: Marking nosave pages: [mem 0xd8bc9000-0xd9ffefff]
avril 04 14:03:17 my.hostname.net kernel: PM: Marking nosave pages: [mem 0xda000000-0xffffffff]
avril 04 14:03:17 my.hostname.net kernel: PM: Basic memory bitmaps created
avril 04 14:03:17 my.hostname.net kernel: PM: Preallocating image memory... done (allocated 1354207 pages)
avril 04 14:03:17 my.hostname.net kernel: PM: Allocated 5416828 kbytes in 0.30 seconds (18056.09 MB/s)
avril 04 14:03:17 my.hostname.net kernel: PM: freeze of devices complete after 21.014 msecs
avril 04 14:03:17 my.hostname.net kernel: PM: late freeze of devices complete after 12.136 msecs
avril 04 14:03:17 my.hostname.net kernel: PM: noirq freeze of devices complete after 0.474 msecs
avril 04 14:03:17 my.hostname.net kernel: PM: Saving platform NVS memory
avril 04 14:03:17 my.hostname.net kernel: PM: Creating hibernation image:
avril 04 14:03:17 my.hostname.net kernel: PM: Need to copy 1351957 pages
avril 04 14:03:17 my.hostname.net kernel: PM: Normal pages needed: 1351957 + 1024, available pages: 2810595
avril 04 14:03:17 my.hostname.net kernel: PM: Restoring platform NVS memory
avril 04 14:03:17 my.hostname.net kernel: PM: noirq restore of devices complete after 3.357 msecs
avril 04 14:03:17 my.hostname.net kernel: PM: early restore of devices complete after 76.659 msecs
avril 04 14:03:17 my.hostname.net kernel: PM: restore of devices complete after 1614.749 msecs
avril 04 14:03:17 my.hostname.net kernel: PM: Image restored successfully.
avril 04 14:03:17 my.hostname.net kernel: PM: Basic memory bitmaps freed
Comment by robert r (crobe) - Monday, 04 April 2016, 17:52 GMT
Inspired by your last comment about systemd's own resume procedure I checked the wiki and indeed, there is a systemd and an sd-encrypt hook for mkinitcpio, that handles resuming from suspend.

After following this thread about configuration for encrypted root filesystem: https://bbs.archlinux.org/viewtopic.php?id=175740 I was able to suspend and resume 3 times in a row with running Xorg _but_ from tty2, which finally gave me the general protection fault I was looking for, which is a case for reporting upstream I guess.
Comment by Florent Thiery (fthiery) - Tuesday, 05 April 2016, 07:43 GMT
To check that it is not systemd-specific, i changed my hooks in /etc/mkinitcpio.conf to HOOKS="base udev resume autodetect modconf block filesystems keyboard fsck"
$ mkinitcpio -p linux

My kernel cmd line is: /boot/vmlinuz-linux root=UUID=42a163ab-0771-48ac-b296-caf8f549dec0 rw quiet resume=/dev/disk/by-uuid/403f654c-925b-4ac3-87c8-dd0fc85d0e9a

The resume uuid is correct:
$ ls -l /dev/disk/by-uuid/ | grep 403f654c
lrwxrwxrwx 1 root root 10 5 avril 09:26 403f654c-925b-4ac3-87c8-dd0fc85d0e9a -> ../../sda2
$ LANG=C fdisk -l /dev/sda | grep sda2
/dev/sda2 215560192 250068991 34508800 16.5G 82 Linux swap / Solaris

Then i hibernated with firefox & chrome opened (with a few tabs):
$ echo platform > /sys/power/disk
$ echo disk > /sys/power/state

Note that i use the following alias to print out the logs:
alias lastpm='LANG=C journalctl -k -b -1 | grep "PM: " | tail -n 1 && echo "Power on" && LANG=C journalctl -k -b -0 | grep "PM: "

The first resume failed (i.e. black screen), had to hard reset. After bootup:

$ lastpm
Apr 05 09:24:53 my.hostname.net kernel: PM: Hibernation mode set to 'platform'
Power on
Apr 05 09:26:06 my.hostname.net kernel: PM: Registered nosave memory: [mem 0x00000000-0x00000fff]
Apr 05 09:26:06 my.hostname.net kernel: PM: Registered nosave memory: [mem 0x0009d000-0x0009dfff]
Apr 05 09:26:06 my.hostname.net kernel: PM: Registered nosave memory: [mem 0x0009e000-0x0009ffff]
Apr 05 09:26:06 my.hostname.net kernel: PM: Registered nosave memory: [mem 0x000a0000-0x000dffff]
Apr 05 09:26:06 my.hostname.net kernel: PM: Registered nosave memory: [mem 0x000e0000-0x000fffff]
Apr 05 09:26:06 my.hostname.net kernel: PM: Registered nosave memory: [mem 0xba82c000-0xba832fff]
Apr 05 09:26:06 my.hostname.net kernel: PM: Registered nosave memory: [mem 0xbac89000-0xbb0ccfff]
Apr 05 09:26:06 my.hostname.net kernel: PM: Registered nosave memory: [mem 0xd8bc9000-0xd8dd8fff]
Apr 05 09:26:06 my.hostname.net kernel: PM: Registered nosave memory: [mem 0xd8dd9000-0xd8df8fff]
Apr 05 09:26:06 my.hostname.net kernel: PM: Registered nosave memory: [mem 0xd8df9000-0xd9321fff]
Apr 05 09:26:06 my.hostname.net kernel: PM: Registered nosave memory: [mem 0xd9322000-0xd9ffefff]
Apr 05 09:26:06 my.hostname.net kernel: PM: Registered nosave memory: [mem 0xda000000-0xdaffffff]
Apr 05 09:26:06 my.hostname.net kernel: PM: Registered nosave memory: [mem 0xdb000000-0xdf1fffff]
Apr 05 09:26:06 my.hostname.net kernel: PM: Registered nosave memory: [mem 0xdf200000-0xf7ffffff]
Apr 05 09:26:06 my.hostname.net kernel: PM: Registered nosave memory: [mem 0xf8000000-0xfbffffff]
Apr 05 09:26:06 my.hostname.net kernel: PM: Registered nosave memory: [mem 0xfc000000-0xfebfffff]
Apr 05 09:26:06 my.hostname.net kernel: PM: Registered nosave memory: [mem 0xfec00000-0xfec00fff]
Apr 05 09:26:06 my.hostname.net kernel: PM: Registered nosave memory: [mem 0xfec01000-0xfecfffff]
Apr 05 09:26:06 my.hostname.net kernel: PM: Registered nosave memory: [mem 0xfed00000-0xfed03fff]
Apr 05 09:26:06 my.hostname.net kernel: PM: Registered nosave memory: [mem 0xfed04000-0xfed1bfff]
Apr 05 09:26:06 my.hostname.net kernel: PM: Registered nosave memory: [mem 0xfed1c000-0xfed1ffff]
Apr 05 09:26:06 my.hostname.net kernel: PM: Registered nosave memory: [mem 0xfed20000-0xfedfffff]
Apr 05 09:26:06 my.hostname.net kernel: PM: Registered nosave memory: [mem 0xfee00000-0xfee00fff]
Apr 05 09:26:06 my.hostname.net kernel: PM: Registered nosave memory: [mem 0xfee01000-0xfeffffff]
Apr 05 09:26:06 my.hostname.net kernel: PM: Registered nosave memory: [mem 0xff000000-0xffffffff]
Apr 05 09:26:06 my.hostname.net kernel: PM: Registering ACPI NVS region [mem 0xba82c000-0xba832fff] (28672 bytes)
Apr 05 09:26:06 my.hostname.net kernel: PM: Registering ACPI NVS region [mem 0xd8df9000-0xd9321fff] (5410816 bytes)
Apr 05 09:26:06 my.hostname.net kernel: PM: Checking hibernation image partition /dev/disk/by-uuid/403f654c-925b-4ac3-87c8-dd0fc85d0e9a
Apr 05 09:26:06 my.hostname.net kernel: PM: Hibernation image not present or could not be loaded.
Apr 05 09:26:06 my.hostname.net kernel: PM: Starting manual resume from disk
Apr 05 09:26:06 my.hostname.net kernel: PM: Hibernation image partition 8:2 present
Apr 05 09:26:06 my.hostname.net kernel: PM: Looking for hibernation image.
Apr 05 09:26:06 my.hostname.net kernel: PM: Image not found (code -22)
Apr 05 09:26:06 my.hostname.net kernel: PM: Hibernation image not present or could not be loaded.

I then hibernated without apps opened (only a terminal emulator), and this time hibernation resumed properly:

Apr 05 09:26:47 my.hostname.net kernel: PM: Hibernation mode set to 'platform'
Power on
Apr 05 09:27:22 my.hostname.net kernel: PM: Syncing filesystems ... done.
Apr 05 09:27:22 my.hostname.net kernel: PM: Marking nosave pages: [mem 0x00000000-0x00000fff]
Apr 05 09:27:22 my.hostname.net kernel: PM: Marking nosave pages: [mem 0x0009d000-0x000fffff]
Apr 05 09:27:22 my.hostname.net kernel: PM: Marking nosave pages: [mem 0xba82c000-0xba832fff]
Apr 05 09:27:22 my.hostname.net kernel: PM: Marking nosave pages: [mem 0xbac89000-0xbb0ccfff]
Apr 05 09:27:22 my.hostname.net kernel: PM: Marking nosave pages: [mem 0xd8bc9000-0xd9ffefff]
Apr 05 09:27:22 my.hostname.net kernel: PM: Marking nosave pages: [mem 0xda000000-0xffffffff]
Apr 05 09:27:22 my.hostname.net kernel: PM: Basic memory bitmaps created
Apr 05 09:27:22 my.hostname.net kernel: PM: Preallocating image memory... done (allocated 519144 pages)
Apr 05 09:27:22 my.hostname.net kernel: PM: Allocated 2076576 kbytes in 0.23 seconds (9028.59 MB/s)
Apr 05 09:27:22 my.hostname.net kernel: PM: freeze of devices complete after 21.114 msecs
Apr 05 09:27:22 my.hostname.net kernel: PM: late freeze of devices complete after 12.084 msecs
Apr 05 09:27:22 my.hostname.net kernel: PM: noirq freeze of devices complete after 0.473 msecs
Apr 05 09:27:22 my.hostname.net kernel: PM: Saving platform NVS memory
Apr 05 09:27:22 my.hostname.net kernel: PM: Creating hibernation image:
Apr 05 09:27:22 my.hostname.net kernel: PM: Need to copy 516952 pages
Apr 05 09:27:22 my.hostname.net kernel: PM: Normal pages needed: 516952 + 1024, available pages: 3645615
Apr 05 09:27:22 my.hostname.net kernel: PM: Restoring platform NVS memory
Apr 05 09:27:22 my.hostname.net kernel: PM: noirq restore of devices complete after 3.560 msecs
Apr 05 09:27:22 my.hostname.net kernel: PM: early restore of devices complete after 78.341 msecs
Apr 05 09:27:22 my.hostname.net kernel: PM: restore of devices complete after 1612.298 msecs
Apr 05 09:27:22 my.hostname.net kernel: PM: Image restored successfully.
Apr 05 09:27:22 my.hostname.net kernel: PM: Basic memory bitmaps freed

To me, this tends to indicate that the hibernation image generation will fail if too many apps are open. The fact that the hibernation image generation logs do not appear in the journal when hibernation fails is not really helping. Any idea how to ensure that these logs get written to disk ?
Comment by Florent Thiery (fthiery) - Tuesday, 05 April 2016, 07:46 GMT
Also, journalctl considers a resumed session in the same boot identifier, which also makes it more complicated to debug (don't use my alias, it is not correct because of that).
Comment by Florent Thiery (fthiery) - Thursday, 07 April 2016, 17:23 GMT
For what it's worth
- the LTS kernel doesn't change a thing
- i'm using dual displays; once, after testing repeatedly, the system partially resumed: my left display was looking okay (albeit non responsive, but the mouse was moving), and the right display was blank (black); anyone else with similar setup ?
- i upgraded my BIOS -- no change
- my platform is haswell i7-4771 with Intel(R) HD Graphics 4600
Comment by robert r (crobe) - Friday, 22 April 2016, 05:58 GMT
"Non responsive" does not seem to "look ok" to me ;)

I opened a bug for my specific problem: https://bugzilla.kernel.org/show_bug.cgi?id=116791
Also have a look at the other bugreports there, especially this one: https://bugzilla.kernel.org/show_bug.cgi?id=104771
Comment by Milan Oravec (migo) - Friday, 22 April 2016, 07:16 GMT
Hi, I'm curious if you get answer on kernel.org :) For me with 4.4.x 4.5.x kernels first resume results in reboot immediately after image is read from disk. :(
Comment by Florent Thiery (fthiery) - Thursday, 28 April 2016, 08:19 GMT
I don't know about you but for me since linux 4.5 it just never works at all.
Comment by Rafael Nascimento (rafaelndev) - Sunday, 19 June 2016, 14:51 GMT
This patch: https://patchwork.kernel.org/patch/9172981/ + Kernel 4.7-rc3 works for me.
I'm using this patch since day 14 and the resume from hibernate did not fail.
Comment by Milan Oravec (migo) - Monday, 20 June 2016, 08:35 GMT
hi, are you using default arch .config or your own customized? I'll test this new patch if it helps.

thanx.
Comment by Rafael Nascimento (rafaelndev) - Monday, 20 June 2016, 11:35 GMT
Yes, i'm using the default '.config'.
Comment by Milan Oravec (migo) - Monday, 20 June 2016, 12:52 GMT
OK, I'm on 4.7.0-rc4 + patch now. It looks promising so far, 5 hibernation cycles without hang and with solid memory pressure.

total used free shared buff/cache available
Mem: 3956424 1496248 1981352 175124 478824 2039752
Swap: 12287996 545852 11742144

I'm keep on testing and will report back.

Thanx for great suggestion.
Comment by Milan Oravec (migo) - Monday, 20 June 2016, 12:55 GMT
huh :( next hibernation cycle ends with reboot after image loading again :((( no luck for me... :(
Comment by robert r (crobe) - Tuesday, 12 July 2016, 21:59 GMT
Try 4.7.0-rc7 and check out the two patches from my upstream bug report, the kernel guys worked a miracle and it worked for me for 15 cycles on a full featured (i915, USB, KDE and some applications) system.
Comment by Milan Oravec (migo) - Friday, 15 July 2016, 12:43 GMT
Hi Robert, thanx I'm following your bug report and I'm on 7 successful resume cycles so far. Crossing fingers... :)
Comment by Milan Oravec (migo) - Tuesday, 19 July 2016, 07:24 GMT
Hi, this patch made it! Hibernation is rock solid now for me. Thanx Robert for reporting this upstream. When this patch gets included in arch kerenel I'll close this bug.
Comment by Florent Thiery (fthiery) - Tuesday, 23 August 2016, 12:04 GMT
Since 4.7 (currently on 4.7.1) it seems i can finally resume (twice in a row already). Do you know if that patch got included ?
Comment by Milan Oravec (migo) - Wednesday, 24 August 2016, 06:34 GMT
Hi, in vanilla 4.7.2 is not included.
Comment by Florent Thiery (fthiery) - Wednesday, 24 August 2016, 07:36 GMT
Yup, just failed to resume, apparently not :p Thanks Milan
Comment by Milan Oravec (migo) - Wednesday, 24 August 2016, 08:50 GMT
Patched 4.7.0 resumed 100-times for me, but I think something is broken in resume from RAM now, because it hangs sometimes and it was 100% reliable before. :(

I'm compiling 4.7.2 now and will see if it is still broken.
Comment by Florent Thiery (fthiery) - Tuesday, 25 October 2016, 08:01 GMT
Is the patch included in 4.8 ? I am not getting resume issues anymore (a few resumes so far).
Comment by Milan Oravec (migo) - Wednesday, 26 October 2016, 18:57 GMT
Hi, yes all patches included. But I have another problem, computer hangs during hibernation with black screen sometimes after lot of hibernation cycles, but this is another issue for sure and don't belong to this bug report. :)

Loading...