FS#31833 - [linux] 3.4 - 3.6.x suspend fails with mdraid due to async disc suspend

Attached to Project: Arch Linux
Opened by Simeon (bladud) - Sunday, 07 October 2012, 18:07 GMT
Last edited by Tobias Powalowski (tpowa) - Wednesday, 23 January 2013, 15:50 GMT
Task Type Bug Report
Category Kernel
Status Closed
Assigned To Tobias Powalowski (tpowa)
Thomas Bächler (brain0)
Architecture All
Severity Medium
Priority Normal
Reported Version
Due in Version Undecided
Due Date Undecided
Percent Complete 100%
Votes 0
Private No

Details

Description:

For linux 3.3 - 3.6 suspend has been failing for me.

It pauses for 30s or so with a blank screen and then aborts the suspend, putting me back into X.
dmesg told me that the reason for this is that devices sda and sdc failed to suspend;
Note the partitions on sda and sdc are each the first members of separate md raid 1 arrays.

I found that reverting
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=966f1212e1ac5fe3ddf04479d21488ddb36a2608
on top of linux 3.5.5 fixes the problem.

I also confirm that 3.0.44-1-lts works, but 3.4-3.6 are all broken.

On the off chance I tried both the mdadm_udev and mdadm hooks, with no change.
Using systemd 194, but this has been occuring since udev was separate.

I believe - but I wouldn't swear to it because it only happened once, by accident, that
suspend works fine if all md arrays are degraded.

So it seems that suspending discs async combines poorly with mdraid in my case, but I'm not sure why.

/proc/mdstat:
Personalities : [raid1]
md2 : active raid1 sda1[0] sdb1[2]
976631296 blocks super 1.2 [2/2] [UU]
bitmap: 0/8 pages [0KB], 65536KB chunk

md1 : active raid1 sdd3[0] sdc2[1]
648592128 blocks [2/2] [UU]
bitmap: 0/5 pages [0KB], 65536KB chunk

md0 : active raid1 sdd1[0] sdc1[1]
80075840 blocks [2/2] [UU]
bitmap: 1/1 pages [4KB], 65536KB chunk

/proc/scsi/scsi:
Attached devices:
Host: scsi0 Channel: 00 Id: 00 Lun: 00
Vendor: ATA Model: SAMSUNG HD103UJ Rev: 1AA0
Type: Direct-Access ANSI SCSI revision: 05
Host: scsi1 Channel: 00 Id: 00 Lun: 00
Vendor: ATA Model: SAMSUNG HD103UJ Rev: 1AA0
Type: Direct-Access ANSI SCSI revision: 05
Host: scsi2 Channel: 00 Id: 00 Lun: 00
Vendor: ATA Model: SAMSUNG HD753LJ Rev: 1AA0
Type: Direct-Access ANSI SCSI revision: 05
Host: scsi3 Channel: 00 Id: 00 Lun: 00
Vendor: ATA Model: SAMSUNG HD753LJ Rev: 1AA0
Type: Direct-Access ANSI SCSI revision: 05
Host: scsi7 Channel: 00 Id: 00 Lun: 00
Vendor: ATA Model: SAMSUNG HD300LD Rev: WK10
Type: Direct-Access ANSI SCSI revision: 05

Successful and unsuccessful dmesgs during suspend are attached.

I'm reporting this here before anywhere else, because I'm not sure where else to report it to - is it mdraid or the kernel?

Thanks
This task depends upon

Closed by  Tobias Powalowski (tpowa)
Wednesday, 23 January 2013, 15:50 GMT
Reason for closing:  Upstream
Comment by Tobias Powalowski (tpowa) - Friday, 12 October 2012, 12:28 GMT
You need to report this upstream, if the mentioned patch breaks it. We cannot do anything here.
Comment by Simeon (bladud) - Saturday, 13 October 2012, 20:52 GMT
Ok. Where should I report it to? Is there a special list for suspend, or should I just use the kernel bugzilla?
Comment by Greg (dolby) - Saturday, 17 November 2012, 07:51 GMT
Use the kernel bugzilla.
Comment by Greg (dolby) - Monday, 17 December 2012, 23:56 GMT
Upstream report?
Comment by Simeon (bladud) - Monday, 14 January 2013, 18:18 GMT
https://bugzilla.kernel.org/show_bug.cgi?id=48951

Apparently it's not mdraid, but the sata controller.

Loading...