Summary ------- Intermittent ATA "failed to IDENTIFY" faults Overview and Background ----------------------- This report covers a specific, reproducible case of what appears to be a fairly widespread intermittent issue with the ATA driver in at least some relatively recent Arch kernels, and possibly some older ones as well. The common symptom is that an ATA device fails to be properly recognized either when first encountered by the kernel during boot, or later when the device is physically attached. A common feature is an initial report "ataN.xx: failed to IDENTIFY", and then subsequent attempted operations on the device (e.g. partition identification, mounting) naturally fail as well, since the device is effectively not registered with the driver. The attached syslog snippets demonstrate the problem using insertion of an SD card into the PCMCIA slot of an older (c. 2005) Thinkpad T-43. The first example shows the message sequence when the insertion succeeds, and the second when it fails. Observations ------------ * Even in the successful case, several (typically three) occurrences of the "failed to IDENTIFY" error are seen, spaced at 5-second intervals. Eventually though, the driver evidently does obtain a proper response, and the device is registered and reports its identity and configuration normally. Subsequent operations (in this case, partition identification, mount, and device access) proceed normally. * The fault case behavior occurs roughly 80% of the time. It also begins with a similar sequence of three IDENTIFY faults, but the driver then emits the "qc timeout" message, which does not occur in the successful case. The driver evidently continues (indefinitely) to try to identify the device, but once the "qc timeout" message is seen once, all subseqent attempts fail, simply repeating the "IDENIFTY" fault followed by the "qc timeout" message, ad infinitum. (Observed over several hours.) * During a given boot cycle, once a fault case occurs -- i.e., once the "qc timeout" message is seen during one device inseration -- it seems that it always fails thereafter on all subsequent device insert/remove cycles. For obvious reasons, I can't vouch for the accuracy of this observation over an extremely large number of boot cycles, but it does seem to hold at least over a few a dozen or so. Could be coincidence though, since the a priori failure rate (80% or so) is pretty high to begin with. If it is true though, it suggests that driver state is somehow being preserved across insertion cycles. This surprised me, but is perhaps intentional. * It is definitely not a hardware problem: Fortunately, I happen to have two identically configured T-43's, and the problem occurs on both, using recent 3.5-series kernels from Arch, but never occurs on either using 2.6-series kernels from two older distributions (RHEL5 and Puppy). On these older distros, the messages emitted by the driver seem ordinary and unremarkable, and the device is recognized reliably evry time. (Can provide log output from these other systems if requested). Gratuitous Commentary --------------------- This not a critical problem for me, merely annoying. But judging from googled trouble reports -- assuming they are actually related -- it seems to be frustrating a lot of people in a wide variety of contexts. The intermittent nature of the problem convinces quite a few people people that their hardware is implicated, so they get involved with unsuccessful and time consuming hardware swapping efforts which sometimes at first seem to "fix" the problem, but then later it recurs (probably because it is not due to their hardware.) For this reason, I rated the problem as "high" severity. I have no familiarity with the ATA driver per se and am not pretending to, but do have some "career familiarity" :) with hardware/driver interaction in general. IMO, this problem has the look & feel of a low-level driver issue that has perhaps existed for quite a while -- perhaps years -- but has been exacerbated by some recent change that is making it more visible in a wider variety of contexts. I won't ramble on here as to why I suspect this, but if it's useful to you, will be happy to explain in more detail. Just guesswork, really, but perhaps of interest. Possibly related to ------------------- http://www.spinics.net/lists/linux-ide/msg44350.html http://linux.derkeiler.com/Mailing-Lists/Fedora/2009-04/msg02345.html http://forums.gentoo.org/viewtopic-t-930468.html?sid=a4bad6040cb5ceb11baed1e13c6f94dc ==== EXAMPLE 1: Successful device identification and subsequent operation ============ Oct 21 15:00:10 localhost kernel: [19170.221472] pcmcia_socket pcmcia_socket0: pccard: PCMCIA card inserted into slot 0 Oct 21 15:00:10 localhost kernel: [19170.221488] pcmcia_socket pcmcia_socket0: cs: memory probe 0xbc000000-0xbfffffff: Oct 21 15:00:10 localhost kernel: [19170.238454] clean. Oct 21 15:00:10 localhost kernel: [19170.238927] pcmcia 0.0: pcmcia: registering new device pcmcia0.0 (IRQ: 4) Oct 21 15:00:10 localhost kernel: [19170.335199] scsi2 : pata_pcmcia Oct 21 15:00:10 localhost kernel: [19170.335326] ata3: PATA max PIO0 cmd 0x5100 ctl 0x510e irq 4 Oct 21 15:00:10 localhost kernel: [19170.507575] ata3.01: failed to IDENTIFY (I/O error, err_mask=0x2) Oct 21 15:00:15 localhost kernel: [19175.660902] ata3.01: failed to IDENTIFY (I/O error, err_mask=0x2) Oct 21 15:00:20 localhost kernel: [19180.814233] ata3.01: failed to IDENTIFY (I/O error, err_mask=0x2) Oct 21 15:00:26 localhost kernel: [19185.962007] ata3.00: CFA: Memory Card Adapter II, S2-V1.14, max PIO2 Oct 21 15:00:26 localhost kernel: [19185.962017] ata3.00: 7811072 sectors, multi 0: LBA Oct 21 15:00:26 localhost kernel: [19185.968682] ata3.00: configured for PIO0 Oct 21 15:00:26 localhost kernel: [19185.968929] scsi 2:0:0:0: Direct-Access ATA Memory Card Adap S2-V PQ: 0 ANSI: 5 Oct 21 15:00:26 localhost kernel: [19185.969711] sd 2:0:0:0: [sdb] 7811072 512-byte logical blocks: (3.99 GB/3.72 GiB) Oct 21 15:00:26 localhost kernel: [19185.969839] sd 2:0:0:0: [sdb] Write Protect is off Oct 21 15:00:26 localhost kernel: [19185.969905] sd 2:0:0:0: [sdb] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA Oct 21 15:00:26 localhost kernel: [19185.979267] sdb: sdb1 Oct 21 15:00:26 localhost kernel: [19185.979813] sd 2:0:0:0: [sdb] Attached SCSI removable disk Oct 21 15:01:07 localhost kernel: [19227.408563] pcmcia_socket pcmcia_socket0: pccard: card ejected from slot 0 Oct 21 15:01:07 localhost kernel: [19227.408663] ata3.00: disabled Oct 21 15:01:07 localhost kernel: [19227.416574] sd 2:0:0:0: [sdb] Stopping disk Oct 21 15:01:07 localhost kernel: [19227.416632] sd 2:0:0:0: [sdb] START_STOP FAILED Oct 21 15:01:07 localhost kernel: [19227.416639] sd 2:0:0:0: [sdb] Oct 21 15:01:07 localhost kernel: [19227.416643] Result: hostbyte=0x04 driverbyte=0x00 # # vim:syn=messages # ============================= End Example 1 =================================== ============ EXAMPLE 2: Unsuccessful device identification ===================== Oct 25 10:08:26 localhost kernel: [147449.145325] pcmcia_socket pcmcia_socket0: pccard: PCMCIA card inserted into slot 0 Oct 25 10:08:26 localhost kernel: [147449.145794] pcmcia 0.0: pcmcia: registering new device pcmcia0.0 (IRQ: 3) Oct 25 10:08:26 localhost kernel: [147449.198570] scsi9 : pata_pcmcia Oct 25 10:08:26 localhost kernel: [147449.198866] ata4: PATA max PIO0 cmd 0x5100 ctl 0x510e irq 3 Oct 25 10:08:26 localhost kernel: [147449.375250] ata4.01: failed to IDENTIFY (I/O error, err_mask=0x2) Oct 25 10:08:31 localhost kernel: [147454.538615] ata4.01: failed to IDENTIFY (I/O error, err_mask=0x2) Oct 25 10:08:36 localhost kernel: [147459.678697] ata4.01: failed to IDENTIFY (I/O error, err_mask=0x2) Oct 25 10:08:46 localhost kernel: [147469.818658] ata4.00: qc timeout (cmd 0x91) Oct 25 10:08:46 localhost kernel: [147469.818669] ata4.00: failed to IDENTIFY (INIT_DEV_PARAMS failed, err_mask=0x4) Oct 25 10:08:57 localhost kernel: [147479.978678] ata4.00: qc timeout (cmd 0x91) Oct 25 10:08:57 localhost kernel: [147479.978690] ata4.00: failed to IDENTIFY (INIT_DEV_PARAMS failed, err_mask=0x4) Oct 25 10:09:07 localhost kernel: [147490.138672] ata4.00: qc timeout (cmd 0x91) Oct 25 10:09:07 localhost kernel: [147490.138684] ata4.00: failed to IDENTIFY (INIT_DEV_PARAMS failed, err_mask=0x4) Oct 25 10:22:45 localhost kernel: [148308.589881] pcmcia_socket pcmcia_socket0: pccard: card ejected from slot 0 # # vim:syn=messages # ============================= End Example 2 ===================================