Difference between revisions of "Replacing a failed disk in a software mirror"
(Created page with "So here's the scenario, you have a server with disk mirrored with Solaris Disksuite, c0t0d0 and c0t1d0. The disk c0t1d0 has failed and you want to replace it without shutting do...") |
|||
Line 8: | Line 8: | ||
d50: d51 and d52 | d50: d51 and d52 | ||
− | This procedure had been tested and works on a Sunfire V120. You may have to alter this slightly depending on the hardware you are running. | + | This procedure had been tested and works on a Sunfire V120. You may have to alter this slightly depending on the hardware you are running. Consult the User Guide for you server on how best to replace a drive on a running system. |
1. Delete the meta databases stored on the failed disk, stored in this case on slice 7 of the disk | 1. Delete the meta databases stored on the failed disk, stored in this case on slice 7 of the disk | ||
Line 49: | Line 49: | ||
cfgadm -c unconfigure c0::dsk/c0t1d0 | cfgadm -c unconfigure c0::dsk/c0t1d0 | ||
+ | |||
+ | 6. Check that the device is now unconfigured | ||
+ | |||
+ | cfgadm -al | ||
+ | |||
+ | Ap_Id Type Receptacle Occupant Condition | ||
+ | c0 scsi-bus connected configured unknown | ||
+ | c0::dsk/c0t0d0 disk connected configured unknown | ||
+ | '''c0::dsk/c0t1d0''' disk connected '''unconfigured''' unknown | ||
+ | c1 scsi-bus connected unconfigured unknown | ||
+ | usb0/1 unknown empty unconfigured ok | ||
+ | usb0/2 unknown empty unconfigured ok | ||
+ | |||
+ | 7. Physically replace the failed drive | ||
+ | |||
+ | 8. Use cfgadm to see if the OS has automatically configured the drive | ||
+ | |||
+ | cfgadm -al | ||
+ | |||
+ | If the status of the drive hasn't changed to 'configured', change it manually | ||
+ | |||
+ | cfgadm -c configure c0::dsk/c0t1d0 | ||
+ | |||
+ | 9. Use the format command to check that the OS can see the drive. | ||
+ | |||
+ | format | ||
+ | |||
+ | Searching for disks...done | ||
+ | |||
+ | |||
+ | AVAILABLE DISK SELECTIONS: | ||
+ | 0. c0t0d0 <SUN72G cyl 14087 alt 2 hd 24 sec 424> | ||
+ | /pci@1c,600000/scsi@2/sd@0,0 | ||
+ | 1. c0t1d0 <SUN72G cyl 14087 alt 2 hd 24 sec 424> | ||
+ | /pci@1c,600000/scsi@2/sd@1,0 | ||
+ | Specify disk (enter its number): | ||
+ | |||
+ | 10. |
Revision as of 23:48, 14 November 2010
So here's the scenario, you have a server with disk mirrored with Solaris Disksuite, c0t0d0 and c0t1d0. The disk c0t1d0 has failed and you want to replace it without shutting down the box. We have the following metadevices and sub-mirrors:
d0: d1 and d2 d10: d11 and d12 d20: d21 and d22 d30: d31 and d32 d40: d41 and d42 d50: d51 and d52
This procedure had been tested and works on a Sunfire V120. You may have to alter this slightly depending on the hardware you are running. Consult the User Guide for you server on how best to replace a drive on a running system.
1. Delete the meta databases stored on the failed disk, stored in this case on slice 7 of the disk
metadb -d c0t1d0s7
2. Detach the sub-mirrors, from the failed disk, from the meta devices
metadetach -f d0 d2 metadetach -f d10 d12 metadetach -f d20 d22 metadetach -f d30 d32 metadetach -f d40 d42 metadetach -f d50 d52
The -f option is necessary as you will need to force this to happen as the disk has failed.
3. Clear the meta-devices that we associated with the failed disk
metaclear d2 metaclear d12 metaclear d22 metaclear d32 metaclear d42 metaclear d52
4. Find the correct Ap_Id for the failed disk, with the cfgadm command
cfgadm -al
Ap_Id Type Receptacle Occupant Condition c0 scsi-bus connected configured unknown c0::dsk/c0t0d0 disk connected configured unknown c0::dsk/c0t1d0 disk connected configured unknown c1 scsi-bus connected unconfigured unknown usb0/1 unknown empty unconfigured ok usb0/2 unknown empty unconfigured ok
5. Unconfigure the device so that you can remove it
cfgadm -c unconfigure c0::dsk/c0t1d0
6. Check that the device is now unconfigured
cfgadm -al
Ap_Id Type Receptacle Occupant Condition c0 scsi-bus connected configured unknown c0::dsk/c0t0d0 disk connected configured unknown c0::dsk/c0t1d0 disk connected unconfigured unknown c1 scsi-bus connected unconfigured unknown usb0/1 unknown empty unconfigured ok usb0/2 unknown empty unconfigured ok
7. Physically replace the failed drive
8. Use cfgadm to see if the OS has automatically configured the drive
cfgadm -al
If the status of the drive hasn't changed to 'configured', change it manually
cfgadm -c configure c0::dsk/c0t1d0
9. Use the format command to check that the OS can see the drive.
format
Searching for disks...done AVAILABLE DISK SELECTIONS: 0. c0t0d0 <SUN72G cyl 14087 alt 2 hd 24 sec 424> /pci@1c,600000/scsi@2/sd@0,0 1. c0t1d0 <SUN72G cyl 14087 alt 2 hd 24 sec 424> /pci@1c,600000/scsi@2/sd@1,0 Specify disk (enter its number):
10.