Sunteți pe pagina 1din 11

http://sunsolve.sun.com/search/document.do?

assetkey=1-61-208671-1

Document ID: 208671


Old Document ID: (formerly 73132)
Title: Solaris[TM] Volume Manager software: Replacing Disks(Solaris[TM] 9
Operating System and above)
Copyright Notice: Copyright © 2009 Sun Microsystems, Inc. All Rights Reserved
Update Date: Thu May 01 00:00:00 MDT 2008

Solution 208671 : Solaris[TM] Volume Manager software: Replacing


Disks(Solaris[TM] 9 Operating System and above)

Description

Beginning with the Solaris[TM] 9 Operating System, Solaris[TM] Volume Manager(VM)


software uses a new feature called Device-ID (DevID). This feature identifies each
disk not only by its c#t#d# name, but by a unique ID which is generated by the
disk's WWN or serial number. Solaris Volume Manager(VM) relies on the Solaris OS
to supply it with each disk's correct DevID.

When a disk fails and is replaced, a specific procedure is required for disks to
make sure that Solaris OS is updated with the new disk's DevID.

If this procedure is not followed exactly, the errors below may be seen:

Jun 22 18:22:57 host1 metadevadm: [ID 209699 daemon.error] Invalid device


relocation information detected in Solaris Volume Manager

As a result, Solaris OS will not update the DevID until the next reboot, meaning
that although a NEW disk is in the system, the DevID being reported by Solaris OS
to the Solaris VM software is still the OLD disk's DevID.

For Example:

If the DevID of c0t1d0 was "SSEAGATE_ST318203_LR7943" and it is replaced with a


new disk (whose DevID would be "SFUJITSU_MAG3182_005268"), Solaris OS will still
report that the c0t1d0 disk has the DevID of "SSEAGATE_ST318203_LR7943" until the
host is rebooted.

Although it is possible to replace the disk without running through this


procedure, the next system reboot will cause the Solaris VM software to fail the
new disk because the DevID of the disk will have changed, and Solaris VM
will not have any knowledge of that new DevID.

To replace a disk, certain commands must be used to unconfigure the disk that is
to be replaced, as well as configure the new disk. This will cause an update of
the Solaris OS device framework, such that the new disk's DevID will be inserted
and the old one removed.

This information applies to disks marked as "failing", as well as disks that have
already failed. The commands to remove/clear/replace metadevice entities deal
initially with the SVM name-placeholders (d10, d1, d30, etc), and not the actual
device names. A "failing" disk is one that still responds to inquiries, but has
experienced errors that could indicate a future full-failure of the disk. A
"failed" disk has already experienced such a failure. The replacement procedures
for either remain essentially the same.

Steps to Follow
PROCEDURE FOR REPLACING MIRRORED DISKS

Given all of the above, the following set of commands should work in all cases
(though depending on the system configuration, some commands may not be
necessary):

To replace a Solaris VM-controlled disk which is part of a mirror, the following


steps must be followed:

1. Run 'metadetach' to detach all submirrors on the failing disk from their
respective mirrors:

metadetach -f <mirror> <submirror>

Note: If the "-f" option is not used, the following message will be returned:

"Attempt an operation on a submirror that has erred component".

Then run 'metaclear' (**) on those submirror devices:

metaclear <submirror>

Verify there are no existing metadevices left on the disk by running:

metastat -p | grep c#t#d#

2. If there are any replicas on this disk, remove them using:

metadb -d c#t#d#s#

Verify there are no existing replicas left on the disk, by running:

metadb | grep c#t#d#

3. If there are any open filesystems on this disk (not under Solaris VM
control), unmount them. If the disk or a slice on the disk is being used
as a dump device, move it temporarily to another disk. You can check the
existing dump device by running "dumpadm" and change the current dump device
using "dumpadm -d <dump-device>". If this is not done the disk will fail to
unconfigure.

4. Run the 'cfgadm' command to remove the failed disk.

cfgadm -c unconfigure c#::dsk/c#t#d#

NOTE: Use the "cfgadm -al" command to obtain the variable "c#::dsk/c#t#d#".
The variable will be listed under the 'Ap_Id' column from the "cfgadm
-al" command's output.

NOTE: if the message "Hardware specific failure: failed to unconfigure SCSI


device: I/O error" appears, check to make sure that you cleared all
replicas and metadevices from the disk, and that the disk is not being
accessed.

NOTE: To replace internal FC-AL disks, follow


Technical Instruction < Solution: 214845 >

5. Insert and configure the new disk.

cfgadm -c configure c#::dsk/c#t#d#


cfgadm -al (to confirm that disk is configured properly)

6. Run 'format' or 'prtvtoc' to put the desired partition table on the new disk

7. If necessary, recreate any replicas on the new disk:

metadb -a c#t#d#s#

8. Recreate each metadevice to be used as a submirror, then use 'metattach' to


attach those submirrors to the mirrors and start the resync.

NOTE: If the submirror was something other than a simple one-slice concat device,
the metainit command will be different than shown here.

metainit <submirror> 1 1 <c#t#d#s#>


metattach <mirror> <submirror>

9. Run 'metadevadm' on the disk, which will update the New DevID.

metadevadm -u c#t#d#

NOTE: If you get the message "Open of /dev/dsk/c#t#d#s0 failed", it can safely be
ignored (this is a known bug pending a fix).

NOTE: 'metadevadm -u' is usually unnecessary for this replacement procedure since
the DevID information is completely removed from the SVM database by metadetach,
metaclear and 'metadb -d' in step 1 and 2.
On the other hand, 'metadevadm -u' is necessary if the failed disk is replaced by
using 'metareplace -e' described in
http://docs.sun.com/app/docs/doc/817-2530/6mi6gg8de?a=view
PROCEDURE FOR REPLACING DISKS IN A RAID-5 META-DEVICE

Note: If a disk is used in BOTH a mirror and a RAID5, don't use the following
procedure. Instead, follow the instructions for the MIRRORED devices(above). This
is because the RAID5 array, just healed, is treated as a single disk for mirroring
purposes.

To replace an SVM-controlled disk which is part of a RAID5 meta-device, the


following steps must be followed.

1. If there are any open filesystems on this disk (not under SVM control),
unmount them. If the disk or a slice on the disk is being used
as a dump device, move it temporarily to another disk. You can check the
existing dump device by running "dumpadm" and change the current dump device
using "dumpadm -d <dump-device>". If this is not done the disk will fail to
unconfigure.

2. If there are any replicas on this disk, remove them using:

metadb -d c#t#d#s#

Verify there are no existing replicas left on the disk by running:


metadb | grep c#t#d#

3. Run the 'cfgadm' command to remove the failed disk.

cfgadm -c unconfigure c#::dsk/c#t#d#

NOTE: To replace internal FC-AL disks, follow Technical Instruction < Solution:
214845 >

4. Insert and configure the new disk.

cfgadm -c configure c#::dsk/c#t#d#


cfgadm -al (just to confirm that disk is configured properly)

5. Run 'format' or 'prtvtoc' to put the desired partition table on the new disk

6. If necessary, recreate any replicas on the new disk:

metadb -a c#t#d#s#

7. Run metareplace to enable and resync the new disk*.

metareplace -e <raid5-md> c#t#d#s#

8. Run 'metadevadm' on the disk, which will update the New DevID.

metadevadm -u c#t#d#

Note: Due to CR 4808079, a disk can show up as "unavailable" in the


metastat command after running Step 7. To resolve this, run "metastat -i".
After running this command, the device should show a metastat status of "Okay".
EXAMPLES

The following two examples illustrate the commands and sample outputs of the
above procedures.

Example 1: Replacing a Mirrored Disk

In this example, a Netra[TM] t 1400 Server has only one SCSI controller, with 4
disks. SVM is used to mirror both the root and the swap devices between c0t0d0
and c0t2d0. The disk c0t2d0 is failing and needs to be replaced.

Here is the 'format' display before the submirror disk replacement:

format

AVAILABLE DISK SELECTIONS:


0. c0t0d0 <SUN18G cyl 7506 alt 2 hd 19 sec 248>
/pci@1f,4000/scsi@3/sd@0,0
1. c0t1d0 <SUN18G cyl 7506 alt 2 hd 19 sec 248>
/pci@1f,4000/scsi@3/sd@1,0
2. c0t2d0 <SUN18G cyl 7506 alt 2 hd 19 sec 248>
/pci@1f,4000/scsi@3/sd@2,0
3. c0t3d0 <SUN18G cyl 7506 alt 2 hd 19 sec 248>
/pci@1f,4000/scsi@3/sd@3,0

Here is the 'cfgadm' display for controller c0:


cfgadm -al
Ap_Id Type Receptacle Occupant Condition
c0 scsi-bus connected configured unknown
c0::dsk/c0t0d0 disk connected configured unknown
c0::dsk/c0t1d0 disk connected configured unknown
c0::dsk/c0t2d0 disk connected configured unknown
c0::dsk/c0t3d0 disk connected configured unknown

Here is the output of the 'metadb' command, showing the locations of the SVM
database replicas. There is one on each disk.

metadb
flags first blk block count
a u 16 8192 /dev/dsk/c0t0d0s7
a u 16 8192 /dev/dsk/c0t1d0s7
a u 16 8192 /dev/dsk/c0t2d0s7
a u 16 8192 /dev/dsk/c0t3d0s7

Here is the SVM configuration before the submirror disk replacement.


Note: The DevID information is at the bottom.

metastat
d0: Mirror
Submirror 0: d10
State: Okay
Submirror 1: d20
State: Needs maintenance
Pass: 1
Read option: roundrobin (default)
Write option: parallel (default)
Size: 6295232 blocks (3.0 GB)

d10: Submirror of d0
State: Okay
Size: 6295232 blocks (3.0 GB)
Stripe 0:
Device Start Block Dbase State Reloc Hot Spare
c0t0d0s0 0 No Okay Yes

d20: Submirror of d0
State: Needs maintenance
Invoke: metareplace d20 c0t2d0s0 <new device>
Size: 6295232 blocks (3.0 GB)
Stripe 0:
Device Start Block Dbase State Reloc Hot Spare
c0t2d0s0 0 No Maintenance Yes

d1: Mirror
Submirror 0: d11
State: Okay
Submirror 1: d21
State: Needs maintenance
Pass: 1
Read option: roundrobin (default)
Write option: parallel (default)
Size: 2101552 blocks (1.0 GB)

d11: Submirror of d1
State: Okay
Size: 2101552 blocks (1.0 GB)
Stripe 0:
Device Start Block Dbase State Reloc Hot Spare
c0t0d0s1 0 No Okay Yes

d21: Submirror of d1
State: Needs maintenance
Invoke: metareplace d21 c0t2d0s1 <new device>
Size: 2101552 blocks (1.0 GB)
Stripe 0:
Device Start Block Dbase State Reloc Hot Spare
c0t2d0s1 0 No Maintenance Yes

Device Relocation Information:


Device Reloc Device ID
c0t2d0 Yes id1,sd@SFUJITSU_MAG3182L_SUN18G_00526202____
c0t0d0 Yes id1,sd@SSEAGATE_ST318203LSUN18G_LR795377000010210UN3

Since c0t2d0 is the drive that needs to be replaced, use 'metadetach' and
'metaclear' to detach and remove the bad submirrors from that disk.

metadetach -f d0 d20
d0: submirror d20 is detached

metadetach -f d1 d21
d1: submirror d21 is detached

metaclear d20
d20: Concat/Stripe is cleared

metaclear d21
d21: Concat/Stripe is cleared

Here is the 'metastat' output after detaching and removing d20 and d21:

d0: Mirror
Submirror 0: d10
State: Okay
Pass: 1
Read option: roundrobin (default)
Write option: parallel (default)
Size: 6295232 blocks (3.0 GB)

d10: Submirror of d0
State: Okay
Size: 6295232 blocks (3.0 GB)
Stripe 0:
Device Start Block Dbase State Reloc Hot Spare
c0t0d0s0 0 No Okay Yes

d1: Mirror
Submirror 0: d11
State: Okay
Pass: 1
Read option: roundrobin (default)
Write option: parallel (default)
Size: 2101552 blocks (1.0 GB)
d11: Submirror of d1
State: Okay
Size: 2101552 blocks (1.0 GB)
Stripe 0:
Device Start Block Dbase State Reloc Hot Spare
c0t0d0s1 0 No Okay Yes

Device Relocation Information:


Device Reloc Device ID
c0t2d0 Yes id1,sd@SFUJITSU_MAG3182L_SUN18G_00526202____
c0t0d0 Yes id1,sd@SSEAGATE_ST318203LSUN18G_LR795377000010210UN3

Since there is a database replica on the disk to be removed, remove it using:

metadb -d c0t2d0s7

and then remove the failed disk from the system using:

cfgadm -c unconfigure c0::dsk/c0t2d0

After the disk has been physically replaced, use the 'cfgadm' command to
configure the new disk:

cfgadm -c configure c0::dsk/c0t2d0

and then confirm that the new disk has been configured:

cfgadm -al
Ap_Id Type Receptacle Occupant Condition
c0 scsi-bus connected configured unknown
c0::dsk/c0t0d0 disk connected configured unknown
c0::dsk/c0t1d0 disk connected configured unknown
c0::dsk/c0t2d0 disk connected configured unknown
c0::dsk/c0t3d0 disk connected configured unknown

Then run 'format' to put the appropriate partition table onto the disk.

format

[ the steps to create a valid partition table have been left


out for brevity ]

Run 'metadb' to recreate the replica that we removed from the disk:

metadb -a c0t2d0s7

and run 'metainit' to recreate the metadevices that were previously removed and
'metattach' to reattach them to their respective mirrors.

metainit d20 1 1 c0t2d0s0


d20: Concat/Stripe is setup

metainit d21 1 1 c0t2d0s1


d21: Concat/Stripe is setup

metattach d0 d20
d0: submirror d20 is attached
metattach d1 d21
d1: submirror d21 is attached

Running a 'metastat' command will now show the NEW DeviceID for disk c0t2d0:

metastat
d0: Mirror
Submirror 0: d10
State: Okay
Submirror 1: d20
State: Okay
Pass: 1
Read option: roundrobin (default)
Write option: parallel (default)
Size: 6295232 blocks (3.0 GB)

d10: Submirror of d0
State: Okay
Size: 6295232 blocks (3.0 GB)
Stripe 0:
Device Start Block Dbase State Reloc Hot Spare
c0t0d0s0 0 No Okay Yes

d20: Submirror of d0
State: Okay
Size: 6295232 blocks (3.0 GB)
Stripe 0:
Device Start Block Dbase State Reloc Hot Spare
c0t2d0s0 0 No Okay Yes

d1: Mirror
Submirror 0: d11
State: Okay
Submirror 1: d21
State: Okay
Pass: 1
Read option: roundrobin (default)
Write option: parallel (default)
Size: 2101552 blocks (1.0 GB)

d11: Submirror of d1
State: Okay
Size: 2101552 blocks (1.0 GB)
Stripe 0:
Device Start Block Dbase State Reloc Hot Spare
c0t0d0s1 0 No Okay Yes

d21: Submirror of d1
State: Okay
Size: 2101552 blocks (1.0 GB)
Stripe 0:
Device Start Block Dbase State Reloc Hot Spare
c0t2d0s1 0 No Okay Yes

Device Relocation Information:


Device Reloc Device ID
c0t0d0 Yes id1,sd@SFUJITSU_MAG3182L_SUN18G_00526873____
c0t2d0 Yes id1,sd@SSEAGATE_ST318203LSUN18G_LR7943000000W70708e0

After the new disk is attached to the mirror disk, it will be resynchronized.

Run 'metadevadm' to update the SVM database with the new DevID information.
Here we see the old DevID and the new DevID are same since the DevID has been
updated by removing and recreating all metadevices and SVM database replicas on
c0t2d0:

metadevadm -u c0t2d0
Updating Solaris Volume Manager device relocation information for c0t2d0
Old device reloc information:
id1,sd@SSEAGATE_ST318203LSUN18G_LR7943000000W70708e0
New device reloc information:
id1,sd@SSEAGATE_ST318203LSUN18G_LR7943000000W70708e0

Once the resynchronization process is completed, the mirror disk will be back to
fully redundant mode.

Example 2: Replacing a Disk used in only RAID5 metadevice(s)

In this example, a Netra[TM] t 1400 server has only one SCSI controller with 4
disks. A RAID5 SVM configuration is set up across three disks - c0t1d0, c0t2d0
and c0t3d0. The disk c0t2d0 is failing, and needs to be replaced.

Here is the 'format' display before the submirror disk replacement:

format
AVAILABLE DISK SELECTIONS:
0. c0t0d0 <SUN18G cyl 7506 alt 2 hd 19 sec 248>
/pci@1f,4000/scsi@3/sd@0,0
1. c0t1d0 <SUN18G cyl 7506 alt 2 hd 19 sec 248>
/pci@1f,4000/scsi@3/sd@1,0
2. c0t2d0 <SUN18G cyl 7506 alt 2 hd 19 sec 248>
/pci@1f,4000/scsi@3/sd@2,0
3. c0t3d0 <SUN18G cyl 7506 alt 2 hd 19 sec 248>
/pci@1f,4000/scsi@3/sd@3,0

Here is the 'cfgadm' display for controller c0:

cfgadm -al
Ap_Id Type Receptacle Occupant Condition
c0 scsi-bus connected configured unknown
c0::dsk/c0t0d0 disk connected configured unknown
c0::dsk/c0t1d0 disk connected configured unknown
c0::dsk/c0t2d0 disk connected configured unknown
c0::dsk/c0t3d0 disk connected configured unknown

Here is the output of the 'metadb' command, showing the locations of the SVM
database replicas. There is one on each disk.

metadb
flags first blk block count
a u 16 8192 /dev/dsk/c0t0d0s7
a u 16 8192 /dev/dsk/c0t1d0s7
a u 16 8192 /dev/dsk/c0t2d0s7
a u 16 8192 /dev/dsk/c0t3d0s7
Here is the SVM configuration before the disk replacement.
Note: The DevID information at the bottom.

metastat
d3: RAID
State: Needs Maintenance
Invoke: metareplace d3 c0t2d0s7 <new device>
Interlace: 32 blocks
Size: 2077992 blocks (1014 MB)
Original device:
Size: 2081984 blocks (1016 MB)
Device Start Block Dbase State Reloc Hot Spare
c0t1d0s5 9754 No Okay Yes
c0t2d0s5 9754 No Maintenance Yes
c0t3d0s5 9754 No Okay Yes

Device Relocation Information:


Device Reloc Device ID
c0t0d0 Yes id1,sd@SSEAGATE_ST318203LSUN18G_LR795377000010210UN3
c0t1d0 Yes id1,sd@SFUJITSU_MAG3182L_SUN18G_00526873____
c0t2d0 Yes id1,sd@SFUJITSU_MAG3182L_SUN18G_00526202____
c0t3d0 Yes id1,sd@SFUJITSU_MAG3182L_SUN18G_00526842____

Since c0t2d0 is the drive that needs to be replaced, and since the only other
thing on this disk is the SVM replica, remove the existing replica on disk
c0t2d0 using:

metadb -d c0t2d0s7

and use the 'cfgadm' command to remove the failed disk from the system:

cfgadm -c unconfigure c0::dsk/c0t2d0

After the disk has been physically replaced, we use 'cfgadm' to configure the
new disk:

cfgadm -c configure c0::dsk/c0t2d0

and then confirm that the new disk has been configured:

cfgadm -al
Ap_Id Type Receptacle Occupant Condition
c0 scsi-bus connected configured unknown
c0::dsk/c0t0d0 disk connected configured unknown
c0::dsk/c0t1d0 disk connected configured unknown
c0::dsk/c0t2d0 disk connected configured unknown
c0::dsk/c0t3d0 disk connected configured unknown

Then, run 'format' to put the appropriate partition table onto the disk.

format

[ the steps to create a valid partition table have been left


out for brevity ]

Run 'metadb' to recreate the replica that we removed fromthe disk:

metadb -a c0t2d0s7
Run 'metareplace' to add the new disk into the RAID5 device, and for a resync
to occur:

metareplace -e d3 c0t2d0s5

Run 'metadevadm' to update the SVM database with the new DevID information.
Here, the old DevID and the new DevID can be seen:

metadevadm -u c0t2d0
Old device reloc information:
id1,sd@SFUJITSU_MAG3182L_SUN18G_00526202____
New device reloc information:
id1,sd@SSEAGATE_ST318203LSUN18G_LR7943000000W70708e0

Running a 'metastat' command will now show the NEW DeviceID for disk c0t2d0:

metastat

d3: RAID
State: Okay
Interlace: 32 blocks
Size: 2077992 blocks (1014 MB)
Original device:
Size: 2081984 blocks (1016 MB)
Device Start Block Dbase State Reloc Hot Spare
c0t1d0s5 9754 No Okay Yes
c0t2d0s5 9754 No Okay Yes
c0t3d0s5 9754 No Okay Yes

Device Relocation Information:


Device Reloc Device ID
c0t1d0 Yes id1,sd@SFUJITSU_MAG3182L_SUN18G_00526873____
c0t2d0 Yes id1,sd@SSEAGATE_ST318203LSUN18G_LR7943000000W70708e0
c0t3d0 Yes id1,sd@SFUJITSU_MAG3182L_SUN18G_00526842____

Product

Solstice DiskSuite 4.2.1


Solaris Volume Manager Software

S-ar putea să vă placă și