This post is following on from my entry about the servers hard drive dying and how I setup Raid 1.
During an automatic weekly backup, my web server's hard drive failed. Luckily I run a database backup nightly and had last weeks full backup.
I decided it was time to move to Raid. Raid 1 uses two hard drives to store your information. When one drive fails the other continues to operate allowing you to install a new drive and rebuild the array.
Please note, Raid is NOT a backup solution.
So I went and purchased two Seagate 80gb SATA hard drives. Since the motherboard doesn't have SATA support I also needed a SATA controller. I decided it was about time to move towards SATA and also the 80gb SATA drives come with an 8mb cache. The old hard drive in the server was the same model but in IDE form (and thus 2mb cache).
So I got home (from the Hunter Valley) on Saturday afternoon with two drives in hand. Later that day I got a PCI SATA controller based on an ALi m5283 chipset.
I plugged it in, setup the Raid 1 through the cards BIOS and loaded the FreeBSD 5.4 CD. I was greeted with the friendly message that no hard drives were detected, damn. I then did some googling and found that there really wasn't any support for this card outside of Windows. Great.
Anyway I needed a solution quickly and it was late. I needed to get a SATA controller from somewhere on Sunday. The only shop open close to me was Adelong and they had one SATA controller based on the Silicon Image 3112 chipset. Another quick google and from the looks of it FreeBSD supported the card.
So I got it home and plugged it in and setup another hardware raid (quasi hardware at least). Well FreeBSD loaded and showed me two hard drives. Interesting. With Raid 1 you should only see one. Looks like it supported the card, but not the raid function. Oh well I thought, software raid should do.
So a bit of google searching and some failed attempts at software raid I found this quick howto
1. Install FreeBSD on to ad4.
2. Reboot with the Install CD.
3. Enter Fixit mode. (For FreeBSD less than 5.4, use Install CD disc2 as the “live filesystem”)
4. # chroot /dist
# mount_devfs devfs /dev
# gmirror load
# gmirror label -v -b round-robin gm0 /dev/ad4
# gmirror insert gm0 /dev/ad6
# mount /dev/mirror/gm0s1a /mnt
# echo ‘geom_mirror_load=”YES”‘ >> /mnt/boot/loader.conf
# echo ’swapoff=”YES”‘ >> /mnt/etc/rc.conf
5. # sed “s%ad4%mirror/gm0%” /mnt/etc/fstab > /mnt/etc/fstab.new
# mv /mnt/etc/fstab.new /mnt/etc/fstab
Right looked easy. So I setup my first software raid and the system booted! Great.
I did a gmirror status gm0 and go this output:
Name Status Components
mirror/gm0 DEGRADED ad4 ad6(6%)
Gmirror was syncing up my second hard drive (ad6), so I decided to leave it for the night and headed to bed.
The next morning (Monday) I awoke to the follow error:
ad4:TIMEOUT - WRITE_DMA retrying (2 retries left)
The system was still online, but when ever there was any disc activity the system would lock for a few seconds before doing anything.
I did some searching and it seemed that the ATA drivers had some issues
with SATA chipsets. Great.
Anyway there was a patch
out for it, so I patched the system and recompiled the kernel. The system seemed to boot somewhat better. But I spoke too soon. As soon as I tried to copy over the backup image, gmirror reported that ad4 and ad6 were disconnected and the system locked up. Fantastic.
Another google search and it simply looks like the SiliconImage 3112 (Sil3112) is a piece of crap hardware. Don't buy one.
A quick call to Bryn and I had a lovely High Point RocketRaid 1520 card in my hands at about 5pm. Over three times more expensive than the first card I tried, and a much larger box, it was guaranteed to work! (also the fact that the card said FreeBSD support on the box :p).
I loaded it up, setup the raid through the bios and freebsd detected it as a hardware raid! Sweet.
Jul 19 21:21:48 metro kernel: ad4: 76319MB [155061/16/63] at ata2-master UDMA133
Jul 19 21:21:48 metro kernel: ad6: 76319MB [155061/16/63] at ata3-master UDMA133
Jul 19 21:21:48 metro kernel: ar0: 76319MB [9729/255/63] status: READY subdisks:
Jul 19 21:21:48 metro kernel: disk0 READY on ad4 at ata2-master
Jul 19 21:21:48 metro kernel: disk1 READY on ad6 at ata3-master
metro# atacontrol status ar0
ar0: ATA RAID1 subdisks: ad4 ad6 status: READY
four and a half days, 3 SATA controllers and many hours later, that night the server was online. :)