You are here

SVM/SDS Using Hot-spare Disks

The Solaris Volume Manager (SVM) supports hot sparing with hot spares pools, which are a collection of devices devoted to being hot spares. This pool is associated with one or more meta devices, which are configured through the metahs(1m) utility:

$ metahs -a hsp001 c1t5d0s0

$ metahs -a hsp001 c1t6d0s0

This example created a new hot spare pool named hsp001, and assigns two devices to it. We can view the contents of the hot spare pools with metahs's "-i" option:

$ metahs -i
hsp001: 2 hot spares
Device Status Length Reloc
c1t6d0s0 Available 35523720 blocks Yes
c1t5d0s0 Available 35523720 blocks Yes

This displays both devices that are currently assigned to the pool, and includes a status field to indicate if the drive is actively being used to replace a faulted device. Once a hot spare pool is created, it needs to be attached to a meta device with the metaparam(1m) utility:

$ metaparam -h hsp001 d5

This will attach the hot spare pool hsp001 to the meta device d5. To see which hot spare pool is attached to a meta device, you can run metastat(1m) and look for the "Hot spare pool" attribute:

$ metastat d5
d5: RAID
State: Okay
Hot spare pool: hsp001
Interlace: 128 blocks
Size: 106085968 blocks (50 GB)
Original device:
Size: 106086528 blocks (50 GB)
Device Start Block Dbase State Reloc Hot Spare
c1t1d0s0 6002 No Okay Yes
c1t2d0s0 4926 No Okay Yes
c1t3d0s0 4926 No Okay Yes
c1t4d0s0 4926 No Okay Yes

When a disk fails, the kernel will usually log errors similar to the following:

Jul 1 22:42:52 tigger scsi: [ID 107833 kern.warning] WARNING: /pci@1f,0/pci@1/scsi@4/sd@2,0 (sd3):
Jul 1 22:42:52 tigger Error for Command: read(10) Error Level: Fatal
Jul 1 22:42:52 tigger scsi: [ID 107833 kern.notice] Requested Block: 26672702 Error Block: 26672733
Jul 1 22:42:52 tigger scsi: [ID 107833 kern.notice] Vendor: SEAGATE Serial Number: NM020253
Jul 1 22:42:52 tigger scsi: [ID 107833 kern.notice] Sense Key: Media Error
Jul 1 22:42:52 tigger scsi: [ID 107833 kern.notice] ASC: 0x11 (unrecovered read error), ASCQ: 0x0, FRU: 0xe4
Jul 1 22:42:52 tigger md_raid: [ID 371651 kern.warning] WARNING: md d5: write error on /dev/dsk/c1t2d0s0
Jul 1 22:42:52 tigger md_raid: [ID 104909 kern.warning] WARNING: md: d5: /dev/dsk/c1t2d0s0 needs maintenance
Jul 1 22:42:53 tigger md_raid: [ID 241980 kern.notice] NOTICE: md: d5: hotspared device /dev/dsk/c1t2d0s0 with /dev/dsk/c1t6d0s0

This output indicates that block device c1t2d0s0 failed, and hot spare c1t6d0s0 took over. Since this is a RAID5 meta device, recovery is super painful, since we need to recreate the data and parity on the hot spare from the remaining members. We can monitor the rebuild process by running the metastat(1m) command:

$ metastat d5
d5: RAID
State: Resyncing
Resync in progress: 1.6% done
Hot spare pool: hsp001
Interlace: 128 blocks
Size: 106085968 blocks (50 GB)
Original device:
Size: 106086528 blocks (50 GB)
Device Start Block Dbase State Reloc Hot Spare
c1t1d0s0 6002 No Okay Yes
c1t2d0s0 4926 No Resyncing Yes c1t6d0s0
c1t3d0s0 4926 No Okay Yes
c1t4d0s0 4926 No Okay Yes

At some point in the future, you will most likely want to replace the drive, and migrate the data from the hot spare back to the original device. This usually requires replacing the physical drive, updating the Solaris device tree with cfgadm(1m) and devfsadm(1m), and running metadevadm(1m) to update the device relocation data in the meta state database:

$ metadevadm -v -u c1t2d0s0

Updating Solaris Volume Manager device relocation information for c1t2d0
Old device reloc information:
id1,sd@SSEAGATE_SX318203LC______LR869054____102424W2
New device reloc information:
id1,sd@SSEAGATE_SX318203LC______LRA45701____10272998

Once these activities complete, the metareplace(1m) utility can be used to "swap" the hot spare with the original device:

$ metareplace -e d5 c1t2d0s0
d5: device c1t2d0s0 is enabled

$ metastat d5
d5: RAID
State: Resyncing
Resync in progress: 0.1% done
Hot spare pool: hsp001
Interlace: 32 blocks
Size: 106085968 blocks (50 GB)
Original device:
Size: 106089600 blocks (50 GB)
Device Start Block Dbase State Reloc Hot Spare
c1t1d0s0 5042 No Okay Yes
c1t2d0s0 3966 No Resyncing Yes c1t5d0s0
c1t3d0s0 3966 No Okay Yes
c1t4d0s0 3966 No Okay Yes

Once the data is recreated on the original device, the metastat(1m) output returns to normal:

$ metastat d5
d5: RAID
State: Okay
Hot spare pool: hsp001
Interlace: 32 blocks
Size: 106085968 blocks (50 GB)
Original device:
Size: 106089600 blocks (50 GB)
Device Start Block Dbase State Reloc Hot Spare
c1t1d0s0 5042 No Okay Yes
c1t2d0s0 3966 No Okay Yes
c1t3d0s0 3966 No Okay Yes
c1t4d0s0 3966 No Okay Yes

Unix Systems: 

Add new comment

Filtered HTML

  • Web page addresses and e-mail addresses turn into links automatically.
  • Allowed HTML tags: <a> <em> <strong> <cite> <blockquote> <code> <ul> <ol> <li> <dl> <dt> <dd>
  • Lines and paragraphs break automatically.

Plain text

  • No HTML tags allowed.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Lines and paragraphs break automatically.
By submitting this form, you accept the Mollom privacy policy.
Error | HP-UX Tips & Tricks Site

Error

Error message

  • Warning: Cannot modify header information - headers already sent by (output started at /homepages/37/d228974590/htdocs/includes/common.inc:2567) in drupal_send_headers() (line 1207 of /homepages/37/d228974590/htdocs/includes/bootstrap.inc).
  • PDOException: SQLSTATE[42000]: Syntax error or access violation: 1142 INSERT command denied to user 'dbo229817041'@'217.160.155.192' for table 'watchdog': INSERT INTO {watchdog} (uid, type, message, variables, severity, link, location, referer, hostname, timestamp) VALUES (:db_insert_placeholder_0, :db_insert_placeholder_1, :db_insert_placeholder_2, :db_insert_placeholder_3, :db_insert_placeholder_4, :db_insert_placeholder_5, :db_insert_placeholder_6, :db_insert_placeholder_7, :db_insert_placeholder_8, :db_insert_placeholder_9); Array ( [:db_insert_placeholder_0] => 0 [:db_insert_placeholder_1] => cron [:db_insert_placeholder_2] => Attempting to re-run cron while it is already running. [:db_insert_placeholder_3] => a:0:{} [:db_insert_placeholder_4] => 4 [:db_insert_placeholder_5] => [:db_insert_placeholder_6] => http://hpuxtips.es/?q=content/svmsds-using-hot-spare-disks [:db_insert_placeholder_7] => [:db_insert_placeholder_8] => 54.90.207.75 [:db_insert_placeholder_9] => 1513505447 ) in dblog_watchdog() (line 157 of /homepages/37/d228974590/htdocs/modules/dblog/dblog.module).
The website encountered an unexpected error. Please try again later.