Category: Uncategorized
Backup with Data Deduplication - A Conversation Beyond The Compression Ratio
November 30th, 2009OK, dedupuplication technology is cool. It makes disk a viable target for longer term retention. Dedupe, however, is not the panacea of backup. I have been getting a lot of questions about deduplication from my customers and it's a fun topic to discuss. Being a professional pessimist, it's my job to play Debbie Downer at the dedupe party, though, and say "hey now, let’s not lose sight of the fundamentals, people."
1. Reporting – It’s almost not worth doing a backup if you can’t prove it happened, or more importantly, why it didn’t. I appreciate we are not all waiting for the SEC to kick in the doors and look for our latest backup reports to determine if you are going to jail or not. Some products such as CommVault’s Simpana have a very nice native reporting tool with an option to do some very cool statistical trending that makes decisions around necessary throughput and media not so dependent on the crystal ball. Even if you are not looking for a complete solution overhaul or have already taken the Data Domain jump or just plain happy with BackupExec, there are some tools that are remarkably functional at increasingly competitive prices such as EMC’s Data Protection Advisor that cover all the mainstream backup tools.

CommVault’s Data Growth Report from the SRM expanded reporting tools.

2. Integration with the platforms that drive your business – While all the big hitters are touting integration with the soon-to-be ousted VCB, some products have really stood out such as Symantec’s NetBackup, which allows a single backup to provide both machine and single file restore (video here). Also there are a number of VMware specific solutions that have introduced a flavor of dedupe into their technologies as well. Veeam is a standout here that is probably worth taking a look at, providing replication and backup with dedupe in a single, very cost attractive product. This product also lends well to where I see next generation data protection going. See my rant "Why Back Up Your Business Data?"
3. Speed – Depending on how you implement your backups, deduped or not, you may still be racing the sun to get your servers backed up before your users show up to change all the files again. Avamar is a clear dedupe stand out here since the method they utilize to perform the dedupe usually results in ridiculously shorter backups (see Justin’s blog "Data Domain vs. EMC Avamar: Which deduplication technology is better?"). However, being able to add media servers with a single management interface to increase the sheer brute force of your solution with or without deduplication will keep you with the old stand bys like Symantec’s NetBackup and CommVault’s Simpana.
Other criteria off the top of my head include:
1. Do you have application agents for MY applications?
2. Can you restart your backup / restore jobs from where they left off?
3. What is your bare metal solution?
4. Can you protect desktops with the same interface as the datacenter?
5. Do you have integrated archiving / compliance search?
6. How difficult is it to protect / recover the backup solution itself with history?
7. Can you multiplex / multi-stream backups for improved reporting?
8. Can I write to disk and tape at the same time?
9. How granular is your security construct?
10. Do you support my hardware with tools such as NDMP?
11. How well does your solution work with my firewall?
12. If I backup to disk, how do I cut tapes / restore from tapes?
13. Does your solution have CDP as well as regular backups?
14. Do I have to configure every server or does your solution leverage policies?
15. Does your solution manage encryption / encryption keys?
16. Does your solution push updates or do I manually update all the clients?
17. How functional is your GUI / Command Line interface what can / can’t I do from each?
This list could go on for another hundred points depending on the specific needs of your business, but I think we are now at the point when we have a deduplication conversation that extends back to what our dedupe vendor is bringing to the table beyond a compression ratio.
Why Back Up Your Business Data? Server and Email Restores Aren't Good Enough Reasons...
October 27th, 2009What do train tracks and data backup have in common? Do you happen to know why train tracks are 4 feet, 11 3/4 inches apart in England? Because they were built following the ruts left by Roman chariots.
Stop Backing Up!
I am not kidding, guys, just stop it.
One of my favorite questions to ask smart IT folks is why they backup data at all. After the unavoidable litany of doing file level restores, email restores, server restores, compliance requirements, etcetera, etcetera, we come to the real answer: the guy before me did it this way.
Who was this guy anyway? For all we know his retention scheme was based on the number of tapes that came free with the Travan tape drive he bought in the 80s. At least there is some logic to that answer.
In all seriousness, there are lots of good reasons to do backups, but far fewer IT shops have these requirements than we think. So before we spend a small fortune in deduplication, application agents, WAN acceleration, and tape infrastructures, let's figure out if what we are doing is really serving our objectives or just another (Roman) rut.
Objective 1: Restore a file / email—snapshots anyone?
If you really want to get a file back fast, utilize some host-based or (even better) array-based snapshot mechanism. In most cases, these are free and easy to implement and sure beat re-cataloging a bunch of tapes. Oversize your storage by enough to store 30 days worth of snapshots and you have met the requirements for 99% of the restores out there encountered by IT pros. In terms of email (specifically Exchange), leverage the tools already at your disposal, namely Deleted Item Retention. Not many users are using Shift+Delete in their day to day, so this covers this common restore handily. Restores outside of a 30 day window are extremely uncommon, but monthly and even annual snapshots are possible—these, though, should have a pretty good rationale before implementation.
Objective 2: Restore a server
An uncommon restore in the grand scheme of things—but definitely something we want to be prepared for—is bringing the whole server back (potentially offsite). I have yet to find a bare metal solution I really admire (reliable, inexpensive, easy to use, easy on storage, etc). My personal favorite solution here is leveraging some sort of virtualization, even if you only put one virtual machine on a physical server, along with some mainstream replication technology. The virtualization technology can be the free stuff for this to work, so all you are out is the replication software cost or array replication cost, which is generally in the same ballpark as backup software and some old server hardware that you would otherwise be using to hold up your installation manuals in the back room. This solution is far more effective toward the goal and much easier to test. Did I say test? Yeah, do it. Figuring out server level restores at 2AM sucks. In terms of retention, I have no use for a server image from a year ago. I am sure I can contrive a reason to be contrary but that’s just being obtuse. So keep an image, or a week’s worth, and move on.
Objective 3: Compliance
Okay, you got me. This one and its ridiculous retention requirements may drive us to tape purgatory, but at least I never really have to use them for operations. My compliance officer can hug the tapes at night, but I am not betting my job on a media that cannot be tested outside of an actual restore and is very susceptible to environmental fluctuations such as heat, dust, gravity, humidity, cold, air, time, quarks, palm sweat, the breath of the tape guy, etc. I would still like to have it in writing from someone who has to sign the check for this stuff (boss, lawyer, compliance officer, etc) as to exactly why they ask for the retentions they do. Again, maybe they got their guidance from the guy with the free tapes in the 80’s…
Deduplication Wars: EMC Avamar vs CommVault Simpana
June 2nd, 2009OK, so you have decided that deduplication is the best thing ever and a must have for your backup needs. The next big question on the horizon is what KIND of deduplication is right for you. Two of the big hitters in the market today are EMC’s Avamar and CommVault’s Simpana products. Both products seem to be doing very well in the wild and both approach deduplication in completely different manners.
In the case of Avamar, the product is deduplicating at the client using variable block deduplication. Once the scan is complete on the client server and the deduplication hash is created the client actually checks back with the Avamar Data Store appliance farm to see which blocks the farm has not seen and then only the truly unique blocks across the environment (not just that server) are sent over the wire. This results is extremely high levels of deduplication AND remarkably fast backups since very little data is normally left to send after deduplication and comparison to the rest of the environment. The data is stored on the EMC Data Store appliance which presents a pretty simple GUI for the recovery. The only major chink in the armor of Avamar is that it does not have the ability to natively create tapes for those data sets you may want to retain longer than you have space to keep on the appliance farm.
CommVault came at the deduplication process a completely different way and leveraged their existing tape archive construct to create a form of fixed block deduplication. In this case the clients do the same thing they always did: run their scans, package the data, and then shoot it out. Once the data gets to the Media Agent the deduplication occurs and the data is spit out onto any disk target supported by CommVault. Since the deduplication is fixed block, the deduplication ratios are not as good as with variable block, aka Avamar, but certainly much better than typical compression. Since the deduplication occurs on the Media Agent, there is no savings in backup window time. The good news is that this is CommVault and cutting tapes is it’s forte and completely automated with the ability to have different retentions on each type of media to fit all your compliance desires within a single tool. Also, since the format of the media archive did not change, restores are just as fast with deduplicated data as they were with plain Jane backup to disk, which is huge if you have a lot of data to restore. Avamar can be sluggish in terms of restore on the smaller deployments but still certainly functional for your every day restore needs.
On the grand scheme, Avamar is the holy grail of backup speed since it only every sends fractions of incremental data over the wire to the target which reduces not only backup times but the impact of backup on both the hosts and the network. I also give CommVault a major tip of the hat in how they leveraged their existing technology and morphed it into a deduplication technology that brings huge benefits to their current customer base while staying on the commodity hardware bandwagon.
Obviously there are many more features to both products worth investigating and comparing but now you know how the two differ technically in terms of the deduplication angle.
More: Video interview of IT manager who achieved a 20:1 ratio using deduplication technology
ALTUS from Seven Ten Storage: Backup Solution for EMC Centera and Possible Sleep Aid
May 2nd, 2009Who Needs To Backup A Centera Anyway?
With the recent demise of the ill-fated Centera Backup and Recovery Manager (CBRM), it appears that EMC is throwing in the towel on Centera backup software. So who needs Centera backup anyway? Maybe you chose to just buy one Centera and put it in a room with a sprinkler head, better known as professional Russian Roulette. Or maybe professional paranoia has kept you from drinking all the EMC Kool-Aid on replication being the final answer to data protection on the Centera.
Backup is one of those necessary evils in many IT shops. But backing up the Centera by its very nature is not trivial, since it is not your run-of-the-mill array with your run-of-the-mill operating system. Centera’s code, CentraStar, is something far more purpose built (read “black box complete with Oompa Loompas and a secret handshake”) for the ultra compliant needs of healthcare records, federal geographic surveys, good homebrew recipes, or anything else that absolutely, positively must not be changed once written.
So how do you back up this genetically enhanced WORM? That question was hard for even EMC to answer! They went the NDMP route with CBRM as a bolt on to the likes of NetBackup and Legato, trying to treat it like a NAS, but the process was wrought with challenges from the outset. With a surprising nod from EMC, enter Seven10 Storage Software.
The little company based not so far from EMC itself built the better mousetrap with the self-contained Centera backup application ALTUS. Symantec and the other juggernauts of backup need not fear this little silver bullet as it is truly purpose built to the needs of Centera owners. ALTUS will require its own server and tape drive; but for a Centera, backup is not a performance discussion, so a light (VMware anyone?) footprint is looking like the way to go.
So it won’t cure all your backup woes, but if you will sleep better with a tape copy of your Centera, this gem may bring the ZZZZs.
(If you’re having trouble sleeping at night, the IDS Blog recommends Tylenol PM, a scotch on the rocks, or any Jane Austen novel.)
More: Video interview with IT Director who implemented an enterprise-class backup and replication solution using CommVault