Replacing a failed disk in a ZFS/ZPOOL raid array
This has been used on a SPARC S7-2, but should be relevant to most modern sparc based servers:
1. Identify the failed disk in the zpool array:
# zpool status pool: rpool state: DEGRADED config: NAME STATE READ WRITE CKSUM rpool DEGRADED 0 0 0 mirror-0 DEGRADED 0 0 0 c0t5000CCA02F613ACCd0 ONLINE 0 0 0 c0t5000CCA02D0F6C44d0 UNAVAIL 0 0 0
2. Find the device path for the drive you want to remove:
# diskinfo D:devchassis-path D:devchassis-path c:occupant-compdev ---------------------------------- --------------------- /dev/chassis/SYS/HDD0/disk c0t5000CCA02F613ACCd0 /dev/chassis/SYS/HDD1/disk c0t5000CCA02D0F6C44d0 /dev/chassis/SYS/HDD2 - /dev/chassis/SYS/HDD3 - /dev/chassis/SYS/HDD4 - /dev/chassis/SYS/HDD5 - /dev/chassis/SYS/HDD6 - /dev/chassis/SYS/HDD7 - /dev/chassis/SYS/MB/EUSB_DISK/disk c1t0d0
3. Check the drive's status:
# cfgadm -al Ap_Id Type Receptacle Occupant Condition /SYS/DBP/NVME0 unknown empty unconfigured unknown /SYS/DBP/NVME1 unknown empty unconfigured unknown /SYS/DBP/NVME2 unknown empty unconfigured unknown /SYS/DBP/NVME3 unknown empty unconfigured unknown c3 scsi-sas connected configured unknown c3::w5000cca02f613acd,0 disk-path connected configured unknown c4 scsi-sas connected configured unknown c4::w5000CCA02D0F6C45,0 disk-path connected configured unknown usb0/1 usb-storage connected configured ok usb0/2 usb-hub connected configured ok
NOTICE the slight mismatch between the output of the last two commands.
4. Unconfigure the drive:
# cfgadm -c unconfigure c4::w5000CCA02D0F6C45,0
and check that it worked:
# cfgadm -al Ap_Id Type Receptacle Occupant Condition /SYS/DBP/NVME0 unknown empty unconfigured unknown /SYS/DBP/NVME1 unknown empty unconfigured unknown /SYS/DBP/NVME2 unknown empty unconfigured unknown /SYS/DBP/NVME3 unknown empty unconfigured unknown c3 scsi-sas connected configured unknown c3::w5000cca02f613acd,0 disk-path connected configured unknown c4 scsi-sas connected configured unknown c4::w5000CCA02D0F6C45,0 disk-path connected unconfigured unknown
5. Turn on the Ok to Remove indicator for that drive:
# fmadm set-indicator /dev/chassis/SYS/HDD1/disk ok2rm on
and check that it worked:
# fmadm get-indicator /dev/chassis/SYS/HDD1/disk ok2rm The indicator (ok2rm) is set to on.
6. Remove the failed drive and replace with the new one.
7. The new drive should be configured automatically, but check it anyway:
# cfgadm -al Ap_Id Type Receptacle Occupant Condition /SYS/DBP/NVME0 unknown empty unconfigured unknown /SYS/DBP/NVME1 unknown empty unconfigured unknown /SYS/DBP/NVME2 unknown empty unconfigured unknown /SYS/DBP/NVME3 unknown empty unconfigured unknown c3 scsi-sas connected configured unknown c3::w5000cca02f613acd,0 disk-path connected configured unknown c4 scsi-sas connected configured unknown c4::w5000cca07d293ca1,0 disk-path connected configured unknown usb0/1 usb-storage connected configured ok usb0/2 usb-hub connected configured ok
If not, configure with:
# cfgadm -c unconfigure c4::w5000cca07d293ca1,0
8. Get the name od the new drive from the format command:
# format Searching for disks...done AVAILABLE DISK SELECTIONS: 0. c0t5000CCA02F613ACCd0 <HGST-H101860SFSUN600G-A990-558.91GB> /scsi_vhci/disk@g5000cca02f613acc /dev/chassis/SYS/HDD0/disk 1. c0t5000CCA07D293CA0d0 <HGST-H101860SFSUN600G-A990-558.91GB> /scsi_vhci/disk@g5000cca07d293ca0 /dev/chassis/SYS/HDD1/disk 2. c1t0d0 <VT-eUSB-7722-1.91GB> /pci@300/pci@1/pci@0/pci@2/usb@0/storage@1/disk@0,0 /dev/chassis/SYS/MB/EUSB_DISK/disk Specify disk (enter its number):
9. Replace the failed disk with the new disk in the zpool:
# zpool replace rpool c0t5000CCA02D0F6C44d0 c0t5000CCA07D293CA0d0
10. Check the progress with:
# zpool status
11. Once the zpool is happy again, install the boot block on the new disk:
# installboot -f -F zfs /usr/platform/`uname -i`/lib/fs/zfs/bootblk /dev/rdsk/c0t5000CCA07D293CA0d0s0