Arch Linux

Please read this before reporting a bug:
https://wiki.archlinux.org/title/Bug_reporting_guidelines

Do NOT report bugs when a package is just outdated, or it is in the AUR. Use the 'flag out of date' link on the package page, or the Mailing List.

REPEAT: Do NOT report bugs for outdated packages!
Tasklist

FS#9015 - Silicon Image, Inc. SiI 3114 crashing after few seconds on heavy load

Attached to Project: Arch Linux
Opened by Lubos Kolouch (kolcon) - Tuesday, 25 December 2007, 10:17 GMT
Last edited by Tobias Powalowski (tpowa) - Friday, 08 February 2008, 17:41 GMT
Task Type Bug Report
Category Kernel
Status Closed
Assigned To No-one
Architecture i686
Severity High
Priority Normal
Reported Version 2007.08-2
Due in Version Undecided
Due Date Undecided
Percent Complete 100%
Votes 0
Private No

Details

Description:

I have
RAID bus controller: Silicon Image, Inc. SiI 3114 [SATALink/SATARaid] Serial ATA Controller (rev 02)
with SAMSUNG HD753LJ HDD

When I start to copy files to this this, after few seconds (on heavy load) I get this
in dmesg :
ata5.00: exception Emask 0x0 SAct 0x0 SErr 0x2400000 action 0x0
ata5.00: BMDMA2 stat 0x8158001
ata5.00: cmd 35/00:08:6f:56:53/00:00:57:00:00/e0 tag 0 cdb 0x0 data 4096 out
res 51/04:01:76:56:53/00:00:57:00:00/f0 Emask 0x1 (device error)
ata5.00: configured for UDMA/100
ata5: EH complete

Sometimes I get port timeout, followed by soft reset and then hard reset
The result is that the filesystem on the HDD is damaged and inconsistent

This is 100% repeatable, happens every time

Additional info:
# uname -a
Linux pcdoma 2.6.23-ARCH #1 SMP PREEMPT Fri Dec 21 19:39:35 UTC 2007 i686 AMD Sempron(tm) 2300+ AuthenticAMD GNU/Linux

# lspci
00:00.0 Host bridge: VIA Technologies, Inc. VT8377 [KT400/KT600 AGP] Host Bridge (rev 80)
00:01.0 PCI bridge: VIA Technologies, Inc. VT8237 PCI Bridge
00:05.0 Network controller: RaLink RT2500 802.11g Cardbus/mini-PCI (rev 01)
00:08.0 RAID bus controller: Silicon Image, Inc. SiI 3114 [SATALink/SATARaid] Serial ATA Controller (rev 02)
00:0a.0 Multimedia audio controller: Creative Labs SB Live! EMU10k1 (rev 08)
00:0a.1 Input device controller: Creative Labs SB Live! Game Port (rev 08)
00:0f.0 IDE interface: VIA Technologies, Inc. VT82C586A/B/VT82C686/A/B/VT823x/A/C PIPC Bus Master IDE (rev 06)
00:10.0 USB Controller: VIA Technologies, Inc. VT82xxxxx UHCI USB 1.1 Controller (rev 81)
00:10.1 USB Controller: VIA Technologies, Inc. VT82xxxxx UHCI USB 1.1 Controller (rev 81)
00:10.2 USB Controller: VIA Technologies, Inc. VT82xxxxx UHCI USB 1.1 Controller (rev 81)
00:10.3 USB Controller: VIA Technologies, Inc. VT82xxxxx UHCI USB 1.1 Controller (rev 81)
00:10.4 USB Controller: VIA Technologies, Inc. USB 2.0 (rev 86)
00:11.0 ISA bridge: VIA Technologies, Inc. VT8237 ISA bridge [KT600/K8T800/K8T890 South]
00:11.5 Multimedia audio controller: VIA Technologies, Inc. VT8233/A/8235/8237 AC97 Audio Controller (rev 60)
00:12.0 Ethernet controller: VIA Technologies, Inc. VT6102 [Rhine-II] (rev 78)
01:00.0 VGA compatible controller: nVidia Corporation NV5 [RIVA TNT2/TNT2 Pro] (rev 15)

# hdparm -i /dev/sdc

/dev/sdc:

Model=SAMSUNG HD753LJ , FwRev=1AA01106, SerialNo=S13UJ1KPB27933
Config={ Fixed }
RawCHS=16383/16/63, TrkSize=34902, SectSize=554, ECCbytes=4
BuffType=DualPortCache, BuffSize=32767kB, MaxMultSect=16, MultSect=?16?
CurCHS=16383/16/63, CurSects=16514064, LBA=yes, LBAsects=268435455
IORDY=on/off, tPIO={min:120,w/IORDY:120}, tDMA={min:120,rec:120}
PIO modes: pio0 pio1 pio2 pio3 pio4
DMA modes: mdma0 mdma1 mdma2
UDMA modes: udma0 udma1 udma2 udma3 udma4 *udma5
AdvancedPM=yes: disabled (255) WriteCache=enabled
Drive conforms to: unknown: ATA/ATAPI-3,4,5,6,7

* signifies the current active mode

Steps to reproduce:

1) mount the drive
2) start copy files to the drive
3) wait for heavy load and after <30 seconds it fails
This task depends upon

Closed by  Tobias Powalowski (tpowa)
Friday, 08 February 2008, 17:41 GMT
Reason for closing:  Not a bug
Additional comments about closing:  hardware issue
Comment by Lubos Kolouch (kolcon) - Tuesday, 25 December 2007, 12:26 GMT
The second error is this

ata5.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen
ata5.00: cmd 35/00:00:0f:d9:29/00:04:01:00:00/e0 tag 0 cdb 0x0 data 524288 out
res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
ata5: soft resetting port
ata5: port is slow to respond, please be patient (Status 0xd8)
ata5: SRST failed (errno=-16)
ata5: hard resetting port
ata5: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
ata5.00: configured for UDMA/100
ata5: EH complete
Comment by Tobias Powalowski (tpowa) - Wednesday, 26 December 2007, 08:40 GMT
http://www.archlinux.org/~tpowa/2.6.24/
you could try if it is solved in latest rc kernel
Comment by Lubos Kolouch (kolcon) - Wednesday, 26 December 2007, 09:28 GMT
Unfortunately, same result... What should I test next?


# uname -a
Linux pcdoma 2.6.24-rc6-ARCH #1 SMP PREEMPT Fri Dec 21 07:36:48 UTC 2007 i686 AMD Sempron(tm) 2300+ AuthenticAMD GNU/Linux

ata5.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen
ata5.00: cmd 35/00:00:2f:85:04/00:04:00:00:00/e0 tag 0 dma 524288 out
res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
ata5.00: status: { DRDY }
ata5: soft resetting link
ata5: port is slow to respond, please be patient (Status 0xd8)
ata5: SRST failed (errno=-16)
ata5: hard resetting link
ata5: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
ata5.00: configured for UDMA/100
ata5: EH complete
sd 4:0:0:0: [sdc] 1465149168 512-byte hardware sectors (750156 MB)
sd 4:0:0:0: [sdc] Write Protect is off
sd 4:0:0:0: [sdc] Mode Sense: 00 3a 00 00
sd 4:0:0:0: [sdc] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
Comment by Tobias Müller (twam) - Sunday, 30 December 2007, 13:14 GMT
I have similar problems with my HD753LJ whether i connect it to a Silicon 3132 or an Intel ICH7.

Look at http://forums.gentoo.org/viewtopic-t-636641.html for details.
Comment by Glenn Matthys (RedShift) - Thursday, 10 January 2008, 00:27 GMT
This looks more like a hardware issue than a software issue. Check if your cabling is correct, and your disk drive is OK (use smartmontools).
Comment by Lubos Kolouch (kolcon) - Thursday, 10 January 2008, 08:19 GMT
OK, I've sent the HDD for warranty repair. I will report back when I hear from them.
Comment by Lubos Kolouch (kolcon) - Thursday, 07 February 2008, 17:03 GMT
update - I've got another HDD, this time 750GB Western Digital

Same errors!
Comment by Glenn Matthys (RedShift) - Thursday, 07 February 2008, 19:22 GMT
Have you tried replacing the SATA cable with something more decent? Some motherboards come shipped with very crappy SATA cables, and make sure you don't bend them too much.

I've been running on silicon image controllers for quite some time and they've never caused any problems, so I'm still pretty sure your hardware is broken.
Comment by Lubos Kolouch (kolcon) - Thursday, 07 February 2008, 19:41 GMT
Well, I tried 4 cables, I do not have more :)

This is not on-board controller, but a PCI one...

Tomorrow I will have a different brand/chipset controller supplied, so will try
with that one...
Comment by Lubos Kolouch (kolcon) - Friday, 08 February 2008, 17:34 GMT
so, the same WD HDD, new controller, and it works

sata_via 0000:00:07.0: routed to hard irq line 10
scsi0 : sata_via
scsi1 : sata_via
scsi2 : sata_via
ata1: SATA max UDMA/133 cmd 0x0001e800 ctl 0x0001e80a bmdma 0x0001d800 irq 17
ata2: SATA max UDMA/133 cmd 0x0001e400 ctl 0x0001e40a bmdma 0x0001d808 irq 17
ata3: PATA max UDMA/133 cmd 0x0001e000 ctl 0x0001e00a bmdma 0x0001d810 irq 17
ata1: SATA link down (SStatus 0 SControl 310)
ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 310)

so, the question is - bad hardware or driver problem? how to tell?
Comment by Roman Kyrylych (Romashka) - Friday, 08 February 2008, 17:39 GMT
I suggest to search kernel's bugzilla for this and file a bug if it's unknown, though in the end it may turn out as a hardware bug anyway.

Loading...