Category: Deduplication
Backup with Data Deduplication - A Conversation Beyond The Compression Ratio
November 30th, 2009OK, dedupuplication technology is cool. It makes disk a viable target for longer term retention. Dedupe, however, is not the panacea of backup. I have been getting a lot of questions about deduplication from my customers and it's a fun topic to discuss. Being a professional pessimist, it's my job to play Debbie Downer at the dedupe party, though, and say "hey now, let’s not lose sight of the fundamentals, people."
1. Reporting – It’s almost not worth doing a backup if you can’t prove it happened, or more importantly, why it didn’t. I appreciate we are not all waiting for the SEC to kick in the doors and look for our latest backup reports to determine if you are going to jail or not. Some products such as CommVault’s Simpana have a very nice native reporting tool with an option to do some very cool statistical trending that makes decisions around necessary throughput and media not so dependent on the crystal ball. Even if you are not looking for a complete solution overhaul or have already taken the Data Domain jump or just plain happy with BackupExec, there are some tools that are remarkably functional at increasingly competitive prices such as EMC’s Data Protection Advisor that cover all the mainstream backup tools.

CommVault’s Data Growth Report from the SRM expanded reporting tools.

2. Integration with the platforms that drive your business – While all the big hitters are touting integration with the soon-to-be ousted VCB, some products have really stood out such as Symantec’s NetBackup, which allows a single backup to provide both machine and single file restore (video here). Also there are a number of VMware specific solutions that have introduced a flavor of dedupe into their technologies as well. Veeam is a standout here that is probably worth taking a look at, providing replication and backup with dedupe in a single, very cost attractive product. This product also lends well to where I see next generation data protection going. See my rant "Why Back Up Your Business Data?"
3. Speed – Depending on how you implement your backups, deduped or not, you may still be racing the sun to get your servers backed up before your users show up to change all the files again. Avamar is a clear dedupe stand out here since the method they utilize to perform the dedupe usually results in ridiculously shorter backups (see Justin’s blog "Data Domain vs. EMC Avamar: Which deduplication technology is better?"). However, being able to add media servers with a single management interface to increase the sheer brute force of your solution with or without deduplication will keep you with the old stand bys like Symantec’s NetBackup and CommVault’s Simpana.
Other criteria off the top of my head include:
1. Do you have application agents for MY applications?
2. Can you restart your backup / restore jobs from where they left off?
3. What is your bare metal solution?
4. Can you protect desktops with the same interface as the datacenter?
5. Do you have integrated archiving / compliance search?
6. How difficult is it to protect / recover the backup solution itself with history?
7. Can you multiplex / multi-stream backups for improved reporting?
8. Can I write to disk and tape at the same time?
9. How granular is your security construct?
10. Do you support my hardware with tools such as NDMP?
11. How well does your solution work with my firewall?
12. If I backup to disk, how do I cut tapes / restore from tapes?
13. Does your solution have CDP as well as regular backups?
14. Do I have to configure every server or does your solution leverage policies?
15. Does your solution manage encryption / encryption keys?
16. Does your solution push updates or do I manually update all the clients?
17. How functional is your GUI / Command Line interface what can / can’t I do from each?
This list could go on for another hundred points depending on the specific needs of your business, but I think we are now at the point when we have a deduplication conversation that extends back to what our dedupe vendor is bringing to the table beyond a compression ratio.
Deduplication Wars: EMC Avamar vs CommVault Simpana
June 2nd, 2009OK, so you have decided that deduplication is the best thing ever and a must have for your backup needs. The next big question on the horizon is what KIND of deduplication is right for you. Two of the big hitters in the market today are EMC’s Avamar and CommVault’s Simpana products. Both products seem to be doing very well in the wild and both approach deduplication in completely different manners.
In the case of Avamar, the product is deduplicating at the client using variable block deduplication. Once the scan is complete on the client server and the deduplication hash is created the client actually checks back with the Avamar Data Store appliance farm to see which blocks the farm has not seen and then only the truly unique blocks across the environment (not just that server) are sent over the wire. This results is extremely high levels of deduplication AND remarkably fast backups since very little data is normally left to send after deduplication and comparison to the rest of the environment. The data is stored on the EMC Data Store appliance which presents a pretty simple GUI for the recovery. The only major chink in the armor of Avamar is that it does not have the ability to natively create tapes for those data sets you may want to retain longer than you have space to keep on the appliance farm.
CommVault came at the deduplication process a completely different way and leveraged their existing tape archive construct to create a form of fixed block deduplication. In this case the clients do the same thing they always did: run their scans, package the data, and then shoot it out. Once the data gets to the Media Agent the deduplication occurs and the data is spit out onto any disk target supported by CommVault. Since the deduplication is fixed block, the deduplication ratios are not as good as with variable block, aka Avamar, but certainly much better than typical compression. Since the deduplication occurs on the Media Agent, there is no savings in backup window time. The good news is that this is CommVault and cutting tapes is it’s forte and completely automated with the ability to have different retentions on each type of media to fit all your compliance desires within a single tool. Also, since the format of the media archive did not change, restores are just as fast with deduplicated data as they were with plain Jane backup to disk, which is huge if you have a lot of data to restore. Avamar can be sluggish in terms of restore on the smaller deployments but still certainly functional for your every day restore needs.
On the grand scheme, Avamar is the holy grail of backup speed since it only every sends fractions of incremental data over the wire to the target which reduces not only backup times but the impact of backup on both the hosts and the network. I also give CommVault a major tip of the hat in how they leveraged their existing technology and morphed it into a deduplication technology that brings huge benefits to their current customer base while staying on the commodity hardware bandwagon.
Obviously there are many more features to both products worth investigating and comparing but now you know how the two differ technically in terms of the deduplication angle.
More: Video interview of IT manager who achieved a 20:1 ratio using deduplication technology