How To Replace A Failed SVM Disk
Before you replace (what you believe is) a failed Solaris Volume Manager (SVM) disk, you need to establish whether it has indeed failed or is still in the process of failing. Why is it important to determine if an SVM disk has failed? It could save you a little time replacing a failed SVM disk as opposed to a failing one. Read How To Tell The Difference Between A Failed Disk And A Failing Disk to find out which one your disk is. If your disk hasn’t quite failed yet, this article will show you How To Replace A Failing SVM Disk. Now that you have established that you do have a failed SVM disk, find out if the disk contains SVM metadatabase replicas and delete them. Assuming that the failed disk is c1t1d0.
# metadb | grep c1t1d0 W p l /dev/dsk/c1t1d0s7 W p l /dev/dsk/c1t1d0s7 W p l /dev/dsk/c1t1d0s7 # # metadb -d c1t1d0s7 # # metadb flags a m p luo /dev/dsk/c1t0d0s7 a p luo /dev/dsk/c1t0d0s7 a p luo /dev/dsk/c1t0d0s7 # 16 8208 16400 8192 8192 8192
first blk 16 8208 16400
block count 8192 8192 8192
Unconfigure the failed SVM disk
# cfgadm -al Ap_Id Condition c0 unknown c0::dsk/c0t0d0 unknown c1 unknown c1::dsk/c1t0d0 unknown Type scsi-bus CD-ROM scsi-bus disk Receptacle connected connected connected connected Occupant configured configured configured configured
c1::dsk/c1t1d0 disk connected configured unknown c1::dsk/c1t2d0 disk connected configured unknown c1::dsk/c1t3d0 disk connected configured unknown c2 scsi-bus connected unconfigured unknown c3 fc-fabric connected configured unknown c3::5006016239a02018 disk connected configured unknown c3::5006016b39a02018 disk connected configured unknown c3::5006048452a70c17 disk connected configured unknown c3::5006048c52a70c07 disk connected configured unknown c4 fc-fabric connected configured unknown c4::5006016339a02018 disk connected configured unknown c4::5006016a39a02018 disk connected configured unknown c4::5006048452a70c18 disk connected configured unknown c4::5006048c52a70c08 disk connected configured unknown usb0/1 unknown empty unconfigured usb0/2 unknown empty unconfigured usb1/1 unknown empty unconfigured usb1/2 unknown empty unconfigured # # cfgadm -c unconfigure c1::dsk/c1t1d0 cfgadm: Component system is busy, try again: failed to offline: Resource Information ------------------ ------------------------/dev/dsk/c1t1d0s2 Device being used by VxVM #
ok ok ok ok
Note: This host uses SVM to manage internal disks and Veritas Volume Manager (VxVM) to manage SAN attached disks. VxVM keeps track of the internal disks – even if it doesn’t actually manage them – and may not allow you to unconfigure them. To get around this restriction, you may need to forcibly unconfigure the failed SVM disk by specifying the -f parameter to cfgadm.
# cfgadm -f -c unconfigure c1::dsk/c1t1d0 # # cfgadm -al Ap_Id Type Condition c0 scsi-bus unknown c0::dsk/c0t0d0 CD-ROM unknown
configured configured unconfigured configured configured unconfigured configured configured configured configured configured configured configured configured configured configured unconfigured unconfigured unconfigured unconfigured ok ok ok ok
Verify that the failed SVM disk is marked “unconfigured” as above. Sun servers with hotswappable disks will also have the disk’s blue “ready to remove” LED lit. Pull the failed SVM disk out of the drive bay and insert the new disk. The following message will come up in /var/adm/messages.
Jul 20 14:46:09 eap52 rmclomv: [ID 978967 kern.error] DISK @ HDD1 has been inserted.
Configure the new disk.
# cfgadm -c configure c1::dsk/c1t1d0 # # cfgadm -al Ap_Id Type Condition
configured configured configured configured configured configured configured unconfigured configured configured configured configured configured configured configured configured configured configured unconfigured unconfigured unconfigured unconfigured ok ok ok ok
Verify that the new disk has been configured as above. Copy the volume table of contents (VTOC) from the other disk in the mirror set, c1t0d0, onto the new disk.
# prtvtoc /dev/rdsk/c1t0d0s2 | fmthard -s - /dev/rdsk/c1t1d0s2 fmthard: New volume table of contents now in place. #
If prtvtoc returns with an error similar to this, “/dev/rdsk/c1t1d0s2: Cannot get disk geometry“, you will need to run format to label the disk.
# format Searching for disks...done c1t1d0: configured with capacity of 72.36GB AVAILABLE DISK SELECTIONS: 0. c1t0d0 <SUN72G cyl 14087 alt /pci@1f,700000/scsi@2/sd@0,0 1. c1t1d0 <SUN72G cyl 14087 alt /pci@1f,700000/scsi@2/sd@1,0 2. c1t2d0 <SUN72G cyl 14087 alt /pci@1f,700000/scsi@2/sd@2,0 3. c1t3d0 <SUN72G cyl 14087 alt /pci@1f,700000/scsi@2/sd@3,0 Specify disk (enter its number): 1 selecting c1t1d0 [disk formatted] Disk not labeled. Label it now? y 2 hd 24 sec 424> 2 hd 24 sec 424> 2 hd 24 sec 424> 2 hd 24 sec 424>
FORMAT MENU: disk - select a disk type - select (define) a disk type partition - select (define) a partition table current - describe the current disk format - format and analyze the disk repair - repair a defective sector label - write label to the disk analyze - surface analysis defect - defect list management backup - search for backup labels verify - read and display labels save - save new disk/partition definitions inquiry - show vendor, product and revision volname - set 8-character volume name ! - execute , then return quit format> q #
Recreate the metadatabase replicas on the new disk.
# metadb -a -c 3 c1t1d0s7 # # metadb flags first blk a m p luo 16 /dev/dsk/c1t0d0s7 a p luo 8208 /dev/dsk/c1t0d0s7 a p luo 16400 /dev/dsk/c1t0d0s7 a u 16 /dev/dsk/c1t1d0s7 a u 8208 /dev/dsk/c1t1d0s7
block count 8192 8192 8192 8192 8192
a u /dev/dsk/c1t1d0s7 #
16400
8192
Update the new disk’s device ID entry in SVM. This step may not be required but it’s a good idea to do it just in case.
# metadevadm -u c1t1d0 Updating Solaris Volume Manager device relocation information for c1t1d0 Old device reloc information: id1,sd@THITACHI_HUS103073FL3800_V3X6MDDA New device reloc information: id1,sd@THITACHI_HUS103073FL3800_V3X6MDDA #
Enable the submirrors on the replacement disk. Start with the swap partition as this won’t affect any data in case SVM runs into a problem. You may enable the submirrors in the new disk in parallel or in sequence. If the I/O load on the system is heavy then do it in sequence. Otherwise, enable the submirrors in parallel.
# metareplace -e d1 c1t1d0s1 d1: device c1t1d0s1 is enabled solaris_1# metastat d1 d1: Mirror Submirror 0: d11 State: Okay Submirror 1: d21 State: Resyncing Resync in progress: 0 % done Pass: 1 Read option: roundrobin (default) Write option: parallel (default) Size: 10491456 blocks (5.0 GB) d11: Submirror of d1 State: Okay Size: 10491456 blocks (5.0 GB) Stripe 0: Device Start Block Dbase c1t0d0s1 0 No d21: Submirror of d1 State: Resyncing Size: 10491456 blocks (5.0 GB) Stripe 0: Device Start Block Dbase c1t1d0s1 0 No
SVM will resync the submirrors as soon as they are enabled. This is done in the background and may take a fair amount of time depending on the size of the submirrors. Now is a good time to go for a cup of coffee. Don’t forget to check the progress of the resync when you return.