Difference between revisions of "Replacing a failed disk in a ZFS/ZPOOL raid array"
(Created page with "This has been used on a SPARC S7-2, but should be relevant to most modern sparc based servers: 1. Identify the failed disk in the zpool array: # zpool status pool: rpo...") |
|||
Line 3: | Line 3: | ||
1. Identify the failed disk in the zpool array: | 1. Identify the failed disk in the zpool array: | ||
− | # zpool status | + | # '''zpool status''' |
pool: rpool | pool: rpool | ||
state: DEGRADED | state: DEGRADED | ||
Line 17: | Line 17: | ||
2. Find the device path for the drive you want to remove: | 2. Find the device path for the drive you want to remove: | ||
− | # diskinfo D:devchassis-path | + | # '''diskinfo D:devchassis-path''' |
D:devchassis-path c:occupant-compdev | D:devchassis-path c:occupant-compdev | ||
---------------------------------- --------------------- | ---------------------------------- --------------------- | ||
Line 29: | Line 29: | ||
/dev/chassis/SYS/HDD7 - | /dev/chassis/SYS/HDD7 - | ||
/dev/chassis/SYS/MB/EUSB_DISK/disk c1t0d0 | /dev/chassis/SYS/MB/EUSB_DISK/disk c1t0d0 | ||
+ | |||
+ | 3. Check the drive's status: | ||
+ | |||
+ | # '''cfgadm -al''' | ||
+ | Ap_Id Type Receptacle Occupant Condition | ||
+ | /SYS/DBP/NVME0 unknown empty unconfigured unknown | ||
+ | /SYS/DBP/NVME1 unknown empty unconfigured unknown | ||
+ | /SYS/DBP/NVME2 unknown empty unconfigured unknown | ||
+ | /SYS/DBP/NVME3 unknown empty unconfigured unknown | ||
+ | c3 scsi-sas connected configured unknown | ||
+ | c3::w5000cca02f613acd,0 disk-path connected configured unknown | ||
+ | c4 scsi-sas connected configured unknown | ||
+ | '''c4::w5000CCA02D0F6C45,0 disk-path connected configured unknown''' | ||
+ | usb0/1 usb-storage connected configured ok | ||
+ | usb0/2 usb-hub connected configured ok | ||
+ | |||
+ | '''NOTICE''' the slight mismatch between the output of the last two commands. | ||
+ | |||
+ | 4. Unconfigure the drive: | ||
+ | |||
+ | # '''cfgadm -c unconfigure c4::w5000CCA02D0F6C45,0''' | ||
+ | |||
+ | and check that it worked: | ||
+ | |||
+ | # '''cfgadm -al''' | ||
+ | Ap_Id Type Receptacle Occupant Condition | ||
+ | /SYS/DBP/NVME0 unknown empty unconfigured unknown | ||
+ | /SYS/DBP/NVME1 unknown empty unconfigured unknown | ||
+ | /SYS/DBP/NVME2 unknown empty unconfigured unknown | ||
+ | /SYS/DBP/NVME3 unknown empty unconfigured unknown | ||
+ | c3 scsi-sas connected configured unknown | ||
+ | c3::w5000cca02f613acd,0 disk-path connected configured unknown | ||
+ | c4 scsi-sas connected configured unknown | ||
+ | c4::w5000CCA02D0F6C45,0 disk-path connected '''unconfigured''' unknown | ||
+ | |||
+ | 5. Turn on the Ok to Remove indicator for that drive: | ||
+ | |||
+ | # '''fmadm set-indicator /dev/chassis/SYS/HDD1/disk ok2rm on''' | ||
+ | |||
+ | and check that it worked: | ||
+ | |||
+ | # '''fmadm get-indicator /dev/chassis/SYS/HDD1/disk ok2rm''' | ||
+ | The indicator (ok2rm) is set to on. | ||
+ | |||
+ | 6. Remove the failed drive and replace with the new one. | ||
+ | |||
+ | 7. The new drive should be configured automatically, but check it anyway: | ||
+ | |||
+ | # '''cfgadm -al''' | ||
+ | Ap_Id Type Receptacle Occupant Condition | ||
+ | /SYS/DBP/NVME0 unknown empty unconfigured unknown | ||
+ | /SYS/DBP/NVME1 unknown empty unconfigured unknown | ||
+ | /SYS/DBP/NVME2 unknown empty unconfigured unknown | ||
+ | /SYS/DBP/NVME3 unknown empty unconfigured unknown | ||
+ | c3 scsi-sas connected configured unknown | ||
+ | c3::w5000cca02f613acd,0 disk-path connected configured unknown | ||
+ | c4 scsi-sas connected configured unknown | ||
+ | c4::w5000cca07d293ca1,0 disk-path connected '''configured''' unknown | ||
+ | usb0/1 usb-storage connected configured ok | ||
+ | usb0/2 usb-hub connected configured ok | ||
+ | |||
+ | If not, configure with: | ||
+ | |||
+ | # '''cfgadm -c unconfigure c4::w5000cca07d293ca1,0''' | ||
+ | |||
+ | 8. Get the name od the new drive from the format command: | ||
+ | |||
+ | # '''format''' | ||
+ | Searching for disks...done | ||
+ | |||
+ | |||
+ | AVAILABLE DISK SELECTIONS: | ||
+ | 0. c0t5000CCA02F613ACCd0 <HGST-H101860SFSUN600G-A990-558.91GB> | ||
+ | /scsi_vhci/disk@g5000cca02f613acc | ||
+ | /dev/chassis/SYS/HDD0/disk | ||
+ | 1. '''c0t5000CCA07D293CA0d0''' <HGST-H101860SFSUN600G-A990-558.91GB> | ||
+ | /scsi_vhci/disk@g5000cca07d293ca0 | ||
+ | /dev/chassis/SYS/HDD1/disk | ||
+ | 2. c1t0d0 <VT-eUSB-7722-1.91GB> | ||
+ | /pci@300/pci@1/pci@0/pci@2/usb@0/storage@1/disk@0,0 | ||
+ | /dev/chassis/SYS/MB/EUSB_DISK/disk | ||
+ | Specify disk (enter its number): | ||
+ | |||
+ | 9. Replace the failed disk with the new disk in the zpool: | ||
+ | |||
+ | # '''zpool replace rpool c0t5000CCA02D0F6C44d0 c0t5000CCA07D293CA0d0''' | ||
+ | |||
+ | 10. Check the progress with: | ||
+ | |||
+ | # '''zpool status''' | ||
+ | |||
+ | 11. Once the zpool is happy again, install the boot block on the new disk: | ||
+ | |||
+ | # '''installboot -f -F zfs /usr/platform/`uname -i`/lib/fs/zfs/bootblk /dev/rdsk/c0t5000CCA07D293CA0d0s0''' |
Revision as of 01:36, 27 February 2019
This has been used on a SPARC S7-2, but should be relevant to most modern sparc based servers:
1. Identify the failed disk in the zpool array:
# zpool status pool: rpool state: DEGRADED config: NAME STATE READ WRITE CKSUM rpool DEGRADED 0 0 0 mirror-0 DEGRADED 0 0 0 c0t5000CCA02F613ACCd0 ONLINE 0 0 0 c0t5000CCA02D0F6C44d0 UNAVAIL 0 0 0
2. Find the device path for the drive you want to remove:
# diskinfo D:devchassis-path D:devchassis-path c:occupant-compdev ---------------------------------- --------------------- /dev/chassis/SYS/HDD0/disk c0t5000CCA02F613ACCd0 /dev/chassis/SYS/HDD1/disk c0t5000CCA02D0F6C44d0 /dev/chassis/SYS/HDD2 - /dev/chassis/SYS/HDD3 - /dev/chassis/SYS/HDD4 - /dev/chassis/SYS/HDD5 - /dev/chassis/SYS/HDD6 - /dev/chassis/SYS/HDD7 - /dev/chassis/SYS/MB/EUSB_DISK/disk c1t0d0
3. Check the drive's status:
# cfgadm -al Ap_Id Type Receptacle Occupant Condition /SYS/DBP/NVME0 unknown empty unconfigured unknown /SYS/DBP/NVME1 unknown empty unconfigured unknown /SYS/DBP/NVME2 unknown empty unconfigured unknown /SYS/DBP/NVME3 unknown empty unconfigured unknown c3 scsi-sas connected configured unknown c3::w5000cca02f613acd,0 disk-path connected configured unknown c4 scsi-sas connected configured unknown c4::w5000CCA02D0F6C45,0 disk-path connected configured unknown usb0/1 usb-storage connected configured ok usb0/2 usb-hub connected configured ok
NOTICE the slight mismatch between the output of the last two commands.
4. Unconfigure the drive:
# cfgadm -c unconfigure c4::w5000CCA02D0F6C45,0
and check that it worked:
# cfgadm -al Ap_Id Type Receptacle Occupant Condition /SYS/DBP/NVME0 unknown empty unconfigured unknown /SYS/DBP/NVME1 unknown empty unconfigured unknown /SYS/DBP/NVME2 unknown empty unconfigured unknown /SYS/DBP/NVME3 unknown empty unconfigured unknown c3 scsi-sas connected configured unknown c3::w5000cca02f613acd,0 disk-path connected configured unknown c4 scsi-sas connected configured unknown c4::w5000CCA02D0F6C45,0 disk-path connected unconfigured unknown
5. Turn on the Ok to Remove indicator for that drive:
# fmadm set-indicator /dev/chassis/SYS/HDD1/disk ok2rm on
and check that it worked:
# fmadm get-indicator /dev/chassis/SYS/HDD1/disk ok2rm The indicator (ok2rm) is set to on.
6. Remove the failed drive and replace with the new one.
7. The new drive should be configured automatically, but check it anyway:
# cfgadm -al Ap_Id Type Receptacle Occupant Condition /SYS/DBP/NVME0 unknown empty unconfigured unknown /SYS/DBP/NVME1 unknown empty unconfigured unknown /SYS/DBP/NVME2 unknown empty unconfigured unknown /SYS/DBP/NVME3 unknown empty unconfigured unknown c3 scsi-sas connected configured unknown c3::w5000cca02f613acd,0 disk-path connected configured unknown c4 scsi-sas connected configured unknown c4::w5000cca07d293ca1,0 disk-path connected configured unknown usb0/1 usb-storage connected configured ok usb0/2 usb-hub connected configured ok
If not, configure with:
# cfgadm -c unconfigure c4::w5000cca07d293ca1,0
8. Get the name od the new drive from the format command:
# format Searching for disks...done AVAILABLE DISK SELECTIONS: 0. c0t5000CCA02F613ACCd0 <HGST-H101860SFSUN600G-A990-558.91GB> /scsi_vhci/disk@g5000cca02f613acc /dev/chassis/SYS/HDD0/disk 1. c0t5000CCA07D293CA0d0 <HGST-H101860SFSUN600G-A990-558.91GB> /scsi_vhci/disk@g5000cca07d293ca0 /dev/chassis/SYS/HDD1/disk 2. c1t0d0 <VT-eUSB-7722-1.91GB> /pci@300/pci@1/pci@0/pci@2/usb@0/storage@1/disk@0,0 /dev/chassis/SYS/MB/EUSB_DISK/disk Specify disk (enter its number):
9. Replace the failed disk with the new disk in the zpool:
# zpool replace rpool c0t5000CCA02D0F6C44d0 c0t5000CCA07D293CA0d0
10. Check the progress with:
# zpool status
11. Once the zpool is happy again, install the boot block on the new disk:
# installboot -f -F zfs /usr/platform/`uname -i`/lib/fs/zfs/bootblk /dev/rdsk/c0t5000CCA07D293CA0d0s0