was copying the data from our existing mail server to new hardware.
We had some kind of glitch. To this day, I’m not sure what it was
or how it happened. But the old mail server hung every time I tried
to reboot. The BIOS utilities said that the RAID array was corrupt
and would have to be rebuilt.
1: RAID 5
was aggravating, but not the end of the world, I thought. I pulled up
our most recent backup and started restoring. A few hours later, I
really started to sweat. The
backup was corrupted. There were data errors in the
story has a happy ending: Having learned the hard way that one backup
is never enough, I dropped to the second-oldest backup. I loaded it,
started the server … and said a prayer of thanks. It worked!
repeats this mantra endlessly: “Make backups. Keep backups. Check
your backups.” But every one of us also gets busy. The transmitter
is hit by lightning, or we’re doing three remotes on the same day.
We think, “Ah, the server has been running fine, I can put it off
until tomorrow.” But tomorrow turns into the next day and before
you know it, into next week.
disaster strikes and you’re searching for your 6-month-old backup,
hoping that it’s OK.
I’ll start by stating what should be obvious: Backing up your data
is just as important as tower inspections, checking the logs,
changing air filters and doing PM at your transmitter sites. Find the
time to do it.
think that any critical file server should use RAID, a “Redundant
Array of Independent Disks.” But several strong caveats apply.
be careful which type of RAID you use. More on that in a moment.
become familiar with your RAID. Read the manual. One common
misconception is that RAID will automagically rebuild and restore a
failed array. That is not necessarily so! It
depends on the capabilities of your RAID controller and how you’ve
remember that RAID only protects from drive failures. If your
software writes bogus data to the array, it’s still bogus. If you
have a power failure, the entire array could be corrupted. Use a good
uninterruptible power supply and test it regularly to prevent
problems of this type.
and finally, make and keep good backups. If you think you don’t
need them just because you’re using a RAID array, you are going to
be badly burned eventually. What if the RAID controller itself has a
is a great deal of information about RAID available on the Internet,
starting with Wikipedia. I’ll just hit the highlights.
0 is the simplest and is arguably useless. All it does is combine
drives into One Big Disk. It provides no backup for your data. RAID 1
is much better; it “mirrors” the data between two or more drives,
in essence, creating a copy of everything for you. RAID 3 and 4 are
rarely used nowadays, so I’ll skip them entirely.
for larger storage arrays is RAID 5 (Fig. 1). If you buy a name-brand
server from Dell or HP, their websites will allow you to choose this,
as well as the controller and drive set that you want. Ask their
sales rep for recommendations too.
want a RAID setup that will report a drive failure, then rebuild the
data on the replacement drive. (Again, this is not automatic; you
must choose a RAID setup that does this and then configure it.) You
also want a good hardware controller that handles the number
crunching, to keep down the load on your main processor. Avoid
so-called “software RAID” — i.e., RAID implemented in software
drivers — for serious, high-volume servers.
example, we order our Dell servers with the PERC (Power Edge RAID
controller) installed and ready to go with matched drives. Ask
around, do some Web searches. Look at customer reviews. Sites like
Tiger Direct and PC Mall also have enterprise divisions that can help
you make a good decision.
RECOMMENDATION FOR LARGE ARRAYS: RAID 5
does RAID 5 work? Briefly, referring to Fig. 1: When you write a file
to disk, it will be chopped into blocks, labeled “A,” “B” and
so on in the illustration. Block A1 is written to the first drive,
Block A2 to the second, and so on.
a key feature is illustrated by the blocks labeled “Ap,”
“Bp” and so on. The “p” stands for “parity,”
and represents the file integrity and “check” data that the array
can use to detect errors, and then repair them on the fly. Note how
even the “check” data is scattered across the drives for greater
about speed? Reading from the disk is nice and quick. In fact, if one
program wants the “A1” block, while a second wants the “B2”
block, since they’re on separate drives, a good controller will
even multitask the requests, satisfying them almost concurrently.
are a different story, and this is the first inescapable tradeoff
with RAID 5. The data must be split, stuffed on different drives, and
the “check” data generated and stored as well. This is why you
want a good, “smart” controller like Dell’s PERC; it does all
of this for the operating system, saving the load on your main
second tradeoff is drive space. Only the useless RAID 0 allows you to
use all available space. The others must use at least some of the
available space to store a copy and/or corrective data. With a
typical RAID 5 using four 1 terabyte drives, for example, you can
expect about 3 terabytes of available space. The remainder is used
for the recovery information.
Mellon released a study a few years ago. They discovered that, if one
drive in a RAID array fails, it is entirely possible to have a second
failure before the first drive can finish rebuilding (!). This is
especially the case on a really large drive array. The more the data,
(obviously) the longer it takes to rebuild and repair.
moral here is simple: If your RAID array reports a failure, treat
it as an emergency. Following your manufacturer’s
instructions, replace that drive ASAP. You should also keep a spare
drive on hand, ready to go, for a hot swap. There’s no point in
having RAID if you don’t have a known-good spare ready.
takes time, and you’ll have to figure out a way to do it that
doesn’t disrupt operations. In the case of our mail server, I
actually have a second machine built and ready to go. From time to
time, I load the backup onto it and ensure that the server runs
careful where you store your backups as well. Never, ever store it on
the same drive as the original. I have been amazed at the number of
people who will do this. If the drive fails, how do you know the
backup won’t get trashed along with the original data?
nowadays, it’s not just a simple matter of writing it to CD or DVD.
Our mail server, for example, typically has over 50 gigabytes of
data. That would span several disks. In fact, I use the test server
that I just mentioned, killing two birds with one stone. Once a week,
I copy all mail data from the main server to the backup over our
network. I can then immediately confirm a good backup by “running”
that data on the spare server.
you’d store some backups off-site. But there’s a world of
difference between storing traffic and billing data (which might fit
on the aforementioned DVD), and storing hundreds of gigabytes of
music, spot advertising or other data. There is no one good answer,
but consider a second server. If you have a high-speed data link
between your studio complex and a transmitter site, you might even
put that spare/backup server at that remote site.
don’t overlook physics: even with a 100 Megabit data link to the
transmitter site, it’s going to take hours to copy gigabytes of
data. You can estimate the top limit by dividing the network speed by
10: With 100 Base-T, you can’t expect better than 10 Megabytes per
read up on your operating system. Do some Google searches on phrases
like, “backup Windows server 2012” to see what others are doing.
See what problems they’re experiencing.
imaging is another idea, but that’s for another article in and of
itself. With this technique, you take the server offline, then make a
“snapshot” of the entire drive or RAID array. That’s a great
way to do it, but it’s not for the faint of heart or
you do it, your critical servers must keep running. You must be able
to restore them as quickly as possible when they fail. To do that, I
recommend that you start with something like RAID 5 with a good
do regular backups, following your software’s recommendations. Find
out what other users recommend. For example, I joined the online
forums for our audio automation (RCS NexGen) primarily so that I
could swap ideas with other people who use that system.
more than one backup! With our mail server, I keep the latest one and
the one previous to that. I’d keep even more if I had the space.
test those backups to
make sure that they will, in fact, save your bacon when and if you
have a need for them.
Poole is market chief at Crawford Broadcasting in Birmingham, Ala.
an idea for a future column on radio IT? Write to us at