RAID復旧作業

サーバに利用しているマシンからアラートメールが届いた. /dev/sdbが故障したようだ.

This is an automatically generated mail message from mdadm
running on shrike

A DegradedArray event had been detected on md device /dev/md/0.

Faithfully yours, etc.

P.S. The /proc/mdstat file currently contains the following:

Personalities : [raid1] [linear] [multipath] [raid0] [raid6] [raid5] [raid4] [raid10] 
md0 : active raid1 sdb5[1](F) sda5[0]
      976141120 blocks super 1.2 [2/1] [U_]
      
unused devices: <none>

切り離し

md0の状態を確認する. sdbがfault状態になっている.

# mdadm --detail /dev/md0
/dev/md0:
        Version : 1.2
  Creation Time : Sun Apr 28 00:37:05 2013
     Raid Level : raid1
     Array Size : 976141120 (930.92 GiB 999.57 GB)
  Used Dev Size : 976141120 (930.92 GiB 999.57 GB)
   Raid Devices : 2
  Total Devices : 2
    Persistence : Superblock is persistent

    Update Time : Sun Jul  8 22:37:21 2018
          State : clean, degraded
 Active Devices : 1
Working Devices : 1
 Failed Devices : 1
  Spare Devices : 0

           Name : shrike:0  (local to host shrike)
           UUID : a47e938c:26762b62:98253845:48c2acdd
         Events : 1035422

    Number   Major   Minor   RaidDevice State
       0       8        5        0      active sync   /dev/sda5
       2       0        0        2      removed

       1       8       21        -      faulty   /dev/sdb5

sdbをアレイから切り離す.

# mdadm /dev/md0 -r /dev/sdb5
mdadm: hot removed /dev/sdb5 from /dev/md0
# mdadm --detail /dev/md0
/dev/md0:
        Version : 1.2
  Creation Time : Sun Apr 28 00:37:05 2013
     Raid Level : raid1
     Array Size : 976141120 (930.92 GiB 999.57 GB)
  Used Dev Size : 976141120 (930.92 GiB 999.57 GB)
   Raid Devices : 2
  Total Devices : 1
    Persistence : Superblock is persistent

    Update Time : Sat Jul 14 19:18:26 2018
          State : clean, degraded
 Active Devices : 1
Working Devices : 1
 Failed Devices : 0
  Spare Devices : 0

           Name : shrike:0  (local to host shrike)
           UUID : a47e938c:26762b62:98253845:48c2acdd
         Events : 1810761

    Number   Major   Minor   RaidDevice State
       0       8        5        0      active sync   /dev/sda5
       2       0        0        2      removed

再構築に備えてパーティション構成をメモしておく.

$ fdisk -l /dev/sdb
Disk /dev/sdb: 931.5 GiB, 1000204886016 bytes, 1953525168 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disklabel type: dos
Disk identifier: 0x00012eb6

Device     Boot  Start        End    Sectors  Size Id Type
/dev/sdb1  *      2048     976895     974848  476M 83 Linux
/dev/sdb2       978942 1953523711 1952544770  931G  5 Extended
/dev/sdb5       978944 1953523711 1952544768  931G fd Linux raid autodetect

Partition 2 does not start on physical sector boundary.

復旧

同じ型番のHDDを購入し,サーバにマウントする. パーティションを作成し,アレイに参加させる.

# mdadm /dev/md0 -a /dev/sdb5
mdadm: added /dev/sdb5

リビルドが始まった.

# mdadm --detail /dev/md0
/dev/md0:
        Version : 1.2
  Creation Time : Sun Apr 28 00:37:05 2013
     Raid Level : raid1
     Array Size : 976141120 (930.92 GiB 999.57 GB)
  Used Dev Size : 976141120 (930.92 GiB 999.57 GB)
   Raid Devices : 2
  Total Devices : 2
    Persistence : Superblock is persistent

    Update Time : Sat Jul 14 19:53:19 2018
          State : clean, degraded, recovering
 Active Devices : 1
Working Devices : 2
 Failed Devices : 0
  Spare Devices : 1

 Rebuild Status : 0% complete

           Name : shrike:0  (local to host shrike)
           UUID : a47e938c:26762b62:98253845:48c2acdd
         Events : 1812089

    Number   Major   Minor   RaidDevice State
       0       8        5        0      active sync   /dev/sda5
       2       8       21        1      spare rebuilding   /dev/sdb5

また,ブートローダを新しく追加したsdbにインストールする.

# grub-install --target x86_64-efi /dev/sdb