Fri 20 Mar 2009
The past week has given me major troubles. I was tasked with performing a restore of a large database from our offsite storage. Upon getting the tapes back I found that their indexes were no longer available and I would need to read them in from the tapes…there were only 107 tapes. Not knowing the software well enough to accomplish this quickly I contacted support, where things began to get more…”interesting”.
After four hours on the phone I was able to determine the two tapes that would be needed to recover the 79Gb database file and started reading in the specific saveset that was needed. Two hours later I was able to start a restore, which failed. 2Gb of the restore file was missing. After another two hours on the phone with support I was told “Let’s reposition the tape. It could take a while, on newer technology I’ve seen it take an hour, on LTO1 and LTO2 drives I’ve seen it take 8 hours.” You guessed it, I have LTO2 drives. Fortunately I have a multitude of drives to reposition the tapes with so it won’t impact backups, unfortunately I have a time limit on the restore that’s fast approaching.
So what do you do when you backup your file systems? Do you simply believe that the software you backup with validates your tapes or do you test them regularly? Are you satisfied with seeing an email at the end of a backup routine stating “SUCCESS”? Then answer is simply NO. Your backups are only as good as your ability to restore from them. Keeping that in mind and all the different technologies and services available what do you choose?
For us the answer is simple. We require low cost, reliable, offsite secure data storage as do most companies nowadays. TAPE. We’ve looked into collocated services and replicated SANs with virtual tape backup but the cost far exceeds it’s benefits. Tape technology has been proven over and over for decades. There is no cost effective replacement for a good old fashioned tape, even taking into consideration the troubles it can give you. Our entire datacenter can be put onto 6 tapes costing $25 each. 4.8TB for $150.
Any good backup initiative should be followed up with an equally adequate restore plan. So next time you recommend a backup solution plan a regular restore plan to test because there’s nothing worse than spending an entire week restoring one file.
April 15th, 2009 at 3:46 pm
LTO — nice.
The *good* tape drives can still stream linear data faster than any random access disk. Its hard convincing people of those statistics; most go with backup-to-disk. It’s like the “mainframe is death” myth rewritten for tapes.
In 2002, we multiplexed several disk clients just to keep the the tape drives spinning. This was despite requiring a dedicated backup NIC and many times a dedicated backup switch that datacenters funneled directly to the backup servers. (This was Netbackup 4.x with Solaris 8 servers in front of ADIC refrigerator-size libraries. LTO of course.) The block sizes used by network and disk filesystems were (and are) just not optimized for transferring large datasets in short time periods. I luv’ it when people don’t get this until they see a VLDB come to life from LTO.
And ditto on test restores. We charged extra to verify with Netbackup so most customers opted out. The customers with disk filesystems that didn’t support snapshots asked for a copy of the SLA/contract when a backup skipped files that were in use, or when their network mangled the data, and it was always an unsympathetic “told you so” moment.
These are age old debates. No one talks about cpio anymore because tar supposedly deprecated it. No; GNU tar (and most versions) don’t ship with internal verification. cpio almost always shipped with CRC via a flip of a switch. The per file verification that ole’ cpio did was more granular than per job/per dataset verification typically done via md5/sha1 by the big boys. Of course, restoring the whole backup client is best but nobody does that.
Bacula has very Netbackup-like tests for backup jobs, supports snapshots, supports a lot of OSes, and is free. Still use that and cpio against ole DDS