Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Disk reports predicted failure event


Published on

Disk reports predicted failure event 'disk.ioRecoveredError'

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Disk reports predicted failure event

  1. 1. Recovered error predictive failure alert Following error is reported in the NetApp ONTAP cluster node event log: MM/DD/YYYY HH:MM:SS Cluster-02 ERROR disk.ioRecoveredError.pfa: Recovered error predictive failure alert on disk 1c.xx.xx: op 0x2a:b1576e00:0200 sector 0 SCSI: recovered error - Disk reports predicted failure event (1 5d 0 32)  Enclosure # : 1  SMART ASC # : 5D  Connector ID # : 0  SMART ASCQ # : 32 Event log identifier: disk.ioRecoveredError.pfa Severity: ERROR Description: This event is emitted when a disk determines that it will fail shortly. This occurs when a threshold internal to the disk indicates that a failure is imminent. Next step: As the ‘Node-name’ is clearly indicated in the error, along with the physical disk details. You can simply run: Either: cluster::> aggregate show-status –node <node-name> or, cluster::> node run -node cluster-0x cluster-01> aggr status -r
  2. 2. Which would show the status 'prefail' against the disk reporting disk.ioRecoveredError scsi errors. If it’s so, then only 2 things can happen: 1) If there is a matching spare disk assigned on that Node : It will be automatically selected for Rapid RAID Recovery. In this process, the prefailed disk will be copied to the spare. At the end of the copy process, the prefailed disk is removed from the RAID configuration. The node will spin that disk down, mark it as `broken', so that it can be removed from the shelf. As shown in the following output: [Rapid RAID Recovery has begun] data 1.xx.xx 0 SAS 10000 1.63TB 1.64TB (prefail, copy in progress) = disk reporting error data 1.xx.xx 0 SAS 10000 1.63TB 1.64TB (2 % copied) = new spare 2) If there are no spares on that Node: Disk will eventually fail and the raid-group will go in a degraded state. Once the suitable spare disk is available, the contents of the disk being removed (broken state) will be reconstructed onto that spare disk, until that happens, RG will remain in degraded state and raid-group performance could see latency issues depending upon the raid-group disk utilization %. Not an ideal situation; Proactive action: Check if you have matching spare on the partner node if so then assign it to the failed-disk node and hopefully data would be copied to new disk before it actually fails. May, 2019