RAID vs MTBF of hard drives

It pays to be aware of what disks your databases reside upon, if only to reduce spindle contention. Lots of installations have internal RAID systems with data striped across multiple drives, and overall this is a good thing.

But as a sysadmin you need to be aware of MTBF, Mean Time Between Failure. Hard drives are physical devices and eventually they’re going to die. Even solid-state drives will eventually fail, there’s an interesting torture test of SSDs at Tech Report. The test concluded last year with the last two to die having had over 2.5 PETABYTES of data written to them! It’s an interesting report.

Here’s the problem. Typically when we build or buy servers, we’ll order X number of hard drives from the same vendor at the same time. There is a strong likelihood that all of the drives will be from the same manufacturing batch, which means that they will have approximately the same MTBF, or the number of hours of operation before they die.

What does this have to do with SQL Server, or any server for that matter?

You come in on a Monday morning and you hear a beep in the computer room. With a little investigation you find the server that’s crying for help, or perhaps your network operations monitor sees the problem: a drive in a RAID array has failed. No problem – go in to the spares, find a drive with the same capacity, replace the dead drive.

Now is when the MTBF comes in. If all of the drives were from the same batch, then they have approximately the same MTBF. One drive failed. Thus, all of the drives are not far from failure. And what happens when the failed drive is replaced? The RAID controller rebuilds it. How does it rebuild the new drive? It reads the existing drives to recalculate the checksums and rebuild the data on the new drive. So you now have a VERY I/O intensive operation going on with heavy read activity on a bunch of drives that are probably pushing end of life.

Additionally, before the failed drive is replaced and the new drive is rebuilt, the remaining drives are already under additional strain from reading checksums to recalculate the missing data in order to maintain operation.

Your likelihood of an additional drive failing just went up dramatically. It might happen during the rebuild, it might happen later after the system is back in full swing. And it might be a long way down the line – you never know when a drive or many drives will exceed their expected life or fall short of it.

This is just something to be aware of, and don’t be surprised if it happens. Definitely an important reason to make sure your backups are running well and restorable.

There is a way to try to dodge this issue, but it isn’t terribly easy, and that is to multi-source your drives. Buy drives from multiple suppliers and keep track of which drives from what batch are in what array. Gee, sounds like the job for a database!

I’m not sure about mixing drives from different manufacturers. I would think that if they were the same capacity and speed that you’d be OK, but I’m not 100% confident on that, you might introduce some performance difficulties. Back in Ye Olde Days of network administration when we needed to know things like cylinder and sector counts, things were more complicated. But we don’t bother with that (for the most part) these days.