Replacing a failed disk in a software mirror
So here's the scenario, you have a server with disk mirrored with Solaris Disksuite, c0t0d0 and c0t1d0. The disk c0t1d0 has failed and you want to replace it without shutting down the box. We have the following metadevices and sub-mirrors:
- d0: d1 and d2
- d10: d11 and d12
- d20: d21 and d22
- d30: d31 and d32
- d40: d41 and d42
- d50: d51 and d52
This procedure had been tested and works on a SunFire V125. You may have to alter this slightly depending on the hardware you are running. Consult the User Guide for you server on how best to replace a drive on a running system.
1. Delete the meta databases stored on the failed disk, stored in this case on slice 7 of the disk
metadb -d c0t1d0s7
2. Detach the sub-mirrors, from the failed disk, from the meta devices
metadetach -f d0 d2 metadetach -f d10 d12 metadetach -f d20 d22 metadetach -f d30 d32 metadetach -f d40 d42 metadetach -f d50 d52
The -f option is necessary as you will need to force this to happen as the disk has failed.
3. Clear the meta-devices that we associated with the failed disk
metaclear d2 metaclear d12 metaclear d22 metaclear d32 metaclear d42 metaclear d52
4. Find the correct Ap_Id for the failed disk, with the cfgadm command
cfgadm -al
Ap_Id Type Receptacle Occupant Condition c0 scsi-bus connected configured unknown c0::dsk/c0t0d0 disk connected configured unknown c0::dsk/c0t1d0 disk connected configured unknown c1 scsi-bus connected unconfigured unknown usb0/1 unknown empty unconfigured ok usb0/2 unknown empty unconfigured ok
5. Unconfigure the device so that you can remove it
cfgadm -c unconfigure c0::dsk/c0t1d0
6. Check that the device is now unconfigured
cfgadm -al
Ap_Id Type Receptacle Occupant Condition c0 scsi-bus connected configured unknown c0::dsk/c0t0d0 disk connected configured unknown c0::dsk/c0t1d0 disk connected unconfigured unknown c1 scsi-bus connected unconfigured unknown usb0/1 unknown empty unconfigured ok usb0/2 unknown empty unconfigured ok
7. Physically replace the failed drive
8. Use cfgadm to see if the OS has automatically configured the drive
cfgadm -al
If the status of the drive hasn't changed to 'configured', change it manually
cfgadm -c configure c0::dsk/c0t1d0
9. Use the format command to check that the OS can see the drive.
format
Searching for disks...done AVAILABLE DISK SELECTIONS: 0. c0t0d0 <SUN72G cyl 14087 alt 2 hd 24 sec 424> /pci@1c,600000/scsi@2/sd@0,0 1. c0t1d0 <SUN72G cyl 14087 alt 2 hd 24 sec 424> /pci@1c,600000/scsi@2/sd@1,0 Specify disk (enter its number):
10. Copy the primary disks VTOC to the secondary disk:
prtvtoc /dev/rdsk/c0t0d0s2 | fmthard -s - /dev/rdsk/c0t1d0s2
11. Re-create the meta databases on the new disk
metadb -f -a -c3 /dev/dsk/c0t1d0s7
12. Re-create the meta devices on the new disk
metainit d2 1 1 c0t1d0s0 metainit d12 1 1 c0t1d0s1 metainit d22 1 1 c0t1d0s3 metainit d32 1 1 c0t1d0s4 metainit d42 1 1 c0t1d0s5 metainit d52 1 1 c0t1d0s6
13. Attach the meta devices from the new disk to the primary meta devices
metattach d0 d2 metattach d10 d12 metattach d20 d22 metattach d30 d32 metattach d40 d42 metattach d50 d52
You can monitor the progress of mirroring with this command:
metastat | grep -i progress
14. Install the boot block on the new hard disk so that you can boot off it
installboot /usr/platform/`uname -i`/lib/fs/ufs/bootblk /dev/rdsk/c0t1d0s0