Replacing A Failed Hard Drive In A Software RAID1 Array
Reference: https://www.howtoforge.com/replacing_hard_disks_in_a_raid1_array
This guide shows how to replace a failed drive from a Linux RAID1 (software RAID) array without losing data. In this example we have two drives, /dev/sda with partitions /dev/sda1 and /dev/sda2, and /dev/sdb with partitions /dev/sdb1 and /dev/sdb2.
Partitions /dev/sda1 and /dev/sdb1 make up the RAID1 set /dev/md0.
Partitions /dev/sda2 and /dev/sdb2 make up the RAID1 set /dev/md1.
The healthy configuration can be examined by viewing /proc/mdstat:
# cat /proc/mdstat
Personalities : [raid1]
md0 : active raid1 sda1[0] sdb1[1]
24418688 blocks [2/2] [UU]
md1 : active raid1 sda2[0] sdb2[1]
24418688 blocks [2/2] [UU]
unused devices: <none>
How Do I Tell If A Hard Disk Has Failed?
If a disk has fails, you will find alert error messages in the /var/log/messages log file, for example:
Apr 29 19:21:36 simplstor7 kernel: [kern.alert]md/raid1:md0: Disk failure on sda1, disabling device.
Apr 29 19:21:36 simplstor7 kernel: [kern.alert]md/raid1:md1: Disk failure on sda2, disabling device.
When you examine the RAID status by viewing /proc/mdstat it now shows:
# cat /proc/mdstat
Personalities : [raid1]
md1 : active raid1 sda2[0](F) sdb2[1]
1951503360 blocks super 1.2 [2/1] [_U]
bitmap: 9/15 pages [36KB], 65536KB chunk
md0 : active raid1 sda1[0](F) sdb1[1]
511936 blocks super 1.0 [2/1] [_U]
bitmap: 1/1 pages [4KB], 65536KB chunk
unused devices: <none>
Instead of the string [UU], you will see [_U] or [U_] if you have a degraded RAID array. Use the mdadm command to look at the status of each MD array in detail:
# mdadm --detail /dev/md0
/dev/md0:
Version : 1.0
Creation Time : Wed Apr 29 16:24:27 2015
Raid Level : raid1
Array Size : 511936 (500.02 MiB 524.22 MB)
Used Dev Size : 511936 (500.02 MiB 524.22 MB)
Raid Devices : 2
Total Devices : 2
Persistence : Superblock is persistent
Intent Bitmap : Internal
Update Time : Wed Apr 29 19:21:36 2015
State : active, degraded
Active Devices : 1
Working Devices : 1
Failed Devices : 1
Spare Devices : 0
Name : localhost:0
UUID : 746cafb2:567a1ca3:7b8bc047:9c3512da
Events : 42
Number Major Minor RaidDevice State
0 0 0 0 removed
1 8 17 1 active sync /dev/sdb1
0 8 1 - faulty /dev/sda1
Removing the Failed Disk
To remove /dev/sda, we need to first mark each partition as failed and then remove them from their respective RAID arrays.
First we mark /dev/sda1 and /dev/sda2 as failed:
# mdadm --manage /dev/md0 --fail /dev/sda1
# mdadm --manage /dev/md1 --fail /dev/sda2
Then we remove /dev/sda1 from /dev/md0 and /dev/sda2 from /dev/md1:
# mdadm --manage /dev/md0 --remove /dev/sda1
# mdadm --manage /dev/md1 --remove /dev/sda2
# cat /proc/mdstat
Personalities : [raid1]
md1 : active raid1 sdb2[1]
1951503360 blocks super 1.2 [2/1] [_U]
bitmap: 9/15 pages [36KB], 65536KB chunk
md0 : active raid1 sdb1[1]
511936 blocks super 1.0 [2/1] [_U]
bitmap: 1/1 pages
unused devices: <none>
If the system has hot-swapable drives, you can remove the bad drive now. If the drive is not hotswapable then power down the system. Replace the old /dev/sda hard drive with a new one (it must be the same size or larger than the old one or the rebuild of the arrays will fail). After you have changed the hard disk, boot the system.
Adding the New Drive
The first thing we must do now is to create the exact same partitioning as on /dev/sdb. We can do this with one simple command:
# sfdisk -d /dev/sdb | sfdisk /dev/sda
You can run 'sfdisk -l' to check if both hard drives have the same partitioning now.
# sfdisk -l /dev/sda /dev/sdb
Disk /dev/sda: 243031 cylinders, 255 heads, 63 sectors/track
Units: cylinders of 8225280 bytes, blocks of 1024 bytes, counting from 0
Device Boot Start End #cyls #blocks Id System
/dev/sda1 * 0+ 63- 64- 512000 fd Linux raid autodetect
/dev/sda2 63+ 243031- 242968- 1951634432 fd Linux raid autodetect
/dev/sda3 0 - 0 0 0 Empty
/dev/sda4 0 - 0 0 0 Empty
Disk /dev/sdb: 243031 cylinders, 255 heads, 63 sectors/track
Units: cylinders of 8225280 bytes, blocks of 1024 bytes, counting from 0
Device Boot Start End #cyls #blocks Id System
/dev/sdb1 * 0+ 63- 64- 512000 fd Linux raid autodetect
/dev/sdb2 63+ 243031- 242968- 1951634432 fd Linux raid autodetect
/dev/sdb3 0 - 0 0 0 Empty
/dev/sdb4 0 - 0 0 0 Empty
Now add /dev/sda1 to /dev/md0 and /dev/sda2 to /dev/md1:
# mdadm --manage /dev/md0 --add /dev/sda1
mdadm: re-added /dev/sda1
# mdadm --manage /dev/md1 --add /dev/sda2
mdadm: re-added /dev/sda2
Both arrays (/dev/md0 and /dev/md1) will proceed to synchronize. View /proc/mdstat to see progress and determine when it has finished. During the synchronization the output will look like this:
# cat /proc/mdstat
Personalities : [raid1]
md1 : active raid1 sda2[0] sdb2[1]
1951503360 blocks super 1.2 [2/1] [_U]
[>....................] recovery = 0.0% (640576/1951503360) finish=304.5min speed=106762K/sec
bitmap: 9/15 pages [36KB], 65536KB chunk
md0 : active raid1 sda1[0] sdb1[1]
511936 blocks super 1.0 [2/2] [UU]
bitmap: 0/1 pages [0KB], 65536KB chunk
unused devices: <none>
That's it, you have successfully replaced /dev/sda!