Replacing of failed drive in vinum RAID5

May 12, 2007 17:26


Two years and one month after setting up vinum RAID5 array, one disk has failed:

Apr 5 15:51:23 heart /kernel: ad4s1h: hard error reading fsbn 483750673 of 241875305-241875336 (ad4s1 bn 483750673; cn 479911 tn 6 sn 7) Apr 5 15:51:23 heart /kernel: ad4: timeout waiting for cmd=ef s=e0 e=04 {...repeating many times Apr 5 15:51:23 heart /kernel: ad4: timeout sending command=c4 s=e0 e=04 Apr 5 15:51:23 heart /kernel: ad4: error executing command - resetting Apr 5 15:51:23 heart /kernel: ad4: timeout sending command=ec s=80 e=04 Apr 5 15:51:24 heart /kernel: ad4: ATA identify failed Apr 5 15:51:24 heart /kernel: ad4: timeout sending command=c6 s=80 e=04 } Apr 5 15:51:24 heart /kernel: vinum: Can't write config to /dev/ad4s1h, error 5 Apr 5 15:51:24 heart /kernel: ad4: timeout sending command=e7 s=80 e=04 Apr 5 15:51:24 heart /kernel: ad4: flushing cache on close failed

New IDE WD Caviar 200GB now costs $68 (was $107).

dmesg after replacing:

ad2: 190782MB [387621/16/63] at ata1-master UDMA33 ad4: 190782MB [387621/16/63] at ata2-master UDMA100 ad6: 190782MB [387621/16/63] at ata3-master UDMA100

'vinum printconfig' shows that it has lost the failed disk:

... drive b device [nothing here] ...

So the recovery procedure was like this:

# fdisk -BI ad4 ignore "invalid mbr" # disklabel -wB ad4s1 auto # disklabel -e ad4s1 add new partition 'h' equal to 'c' with fstype vinum: 8 partitions: # size offset fstype [fsize bsize bps/cpg] c: 390721905 0 unused 0 0 # (Cyl. 0 - 387620*) h: 390721905 0 vinum # (Cyl. 0 - 387620*) # cat vinum.tmp.conf drive b device /dev/ad4s1h # vinum create vinum.tmp.conf # rm vinum.tmp.conf # vinum start raid5.p0.s1 which was "stalled"

And now it does rebuild (approx. 6 hours).

Upd. Two days after: a crash after more then year of normal work.

IdlePTD at physical address 0x00454000 initial pcb at physical address 0x00392600 panicstr: ffs_blkfree: freeing free block panic messages: --- panic: ffs_blkfree: freeing free block syncing disks... 5 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 giving up on 2 buffers

with stack trace:

#0 0xc01747fa in dumpsys () #1 0xc01745c4 in boot () #2 0xc01749f8 in poweroff_wait () #3 0xc0266dff in ffs_blkfree () #4 0xc026b934 in indir_trunc () #5 0xc026b6e6 in handle_workitem_freeblocks () #6 0xc0269b2b in process_worklist_item () #7 0xc02699ba in softdep_process_worklist () #8 0xc01a3e6b in sched_sync ()
Previous post Next post
Up