Will your data benefit from deduplication? Find out by testing dedupe rates with EMC CAT tool download. »

5 comments

Comment from: Dan Gauld [Visitor] · http://www.mainland.ca
Nice post. Very informative. Wondering do you have an opinion on which technology works best with Databases? I hear the avamar DB restoration sucks wind.
11/09/09 @ 08:32
Comment from: Sean Livingstone [Visitor]
Some comments on yours. :-)

In the section "Block-based deduplication", the last statement is not 100% correct, both will see the same block but only Avamar will not back it up, again. Data Domain will not STORE it again however, the backup software being used with Data Domain will backup the same data over and over again, especially the full backups. So, Avamar will move the duplicate segment only once during backup, in the Data Domain case the associated backup application will move the same data over and over again. This is sort of mentioned in the section "Where does the deduplication happen?" but, it is worth expanding on it in the first section, just to provide clarity.

In the section "Where does the deduplication happen?", it menions that Avamar is rip-and-replace, which is correct for the servers that you will be protecting with Avamar. Good to note thought that Avamar can work along side users' existing backup applications, for the use cases where Avamar is not a good fit. Many customer have this environment working and using both to their advantage.

In the section "How do you expand?" it mentions that Avamar has parity overhead that a user has to be aware of. Data Domain uses RAID 6, which also has parity, double parity overhead in RAID 6's case. This is not a bad thing because it serves to protect the backed up data, which is all good. Point is both have parity overhead.

Thank you.
11/10/09 @ 09:32
Comment from: Roseanne Sullivan [Visitor] Email
Another difference between Avamar and Data Domain deduplication is in the area of replication. I am a Data Domain technical writer who recently took an Avamar Administration course and looked at their user guide,so I have a cursory familiarity. - Replication in Data Domain depends on an optional software feature that is licensed. - Replication seems to be built into the Avamar server software Avamar supports full (or root-to-root) replication, which creates a complete logical copy of an entire source server on the destination Avamar server. Not sure whether Data Domain Replicator enables root replication. It seems that replication in Avamar must be done only when backups are finished. In Data Domain, replication can occur simultaneously.
11/10/09 @ 11:39

Dan - When it comes to databases and deduplication, your mileage truly will vary. I've got customers who are getting 97% commonality on full database backups and others who get under 40%. Both of those are extremes and the rates you'll see will typically be somewhere in the middle. Your comment on Avamar Database restores sucking is likely regarding remote office restores. Avamar is great because in many cases, it will allow you to backup remote office servers back to a central location directly over the wire because it deduplicates the data before sending it across the WAN. The one thing to remember is that if you have to do a full database restore, the entire database must be copied across the wire because none of the data exists at the remote site. Avamar does a great job with databases but I always warn people to have realistic expectations around restore times over the wire and the deduplication rates that databases will get in comparison with file servers.


Sean - You make a couple of great points and I appreciate you providing some clarity. Your point on block-based deduplication is spot-on and is definitely a big differentiation point between Avamar and Data Domain and how much data is actually moved off of the server being backed up.

Regarding the rip-and-replace aspect of Avamar, it definitely only applies to servers where you will be deploying Avamar. I have seen many people bring in Avamar for a specific subset of data such as remote office backups, VMware, etc. and leave their legacy backup software for other portions of the environment. Given the node-based architecture of Avamar, that makes it very easy for people to phase Avamar in and add nodes over time instead of doing a full rip-and-replace of their existing backup architecture.

Great point on the parity overhead of both Avamar and Data Domain as well. Data Domain has the RAID 6 over head and Avamar has both the RAID 5 and RAIN overhead. I agree 100% that this is far from a bad thing. Since you're only storing each unique piece of data in your environment one time, maintaining the integrity of that data becomes even more critical. Whether it is Data Domain's Data Invulnerability Architecture or Avamar's use of RAIN and Snapshots, both do an amazing job of making sure your backups are safe.


Roseanne - As you stated, Avamar replication is built-in and does not require an additional license. Data Domain's replication does require an optional license on both sides. With Avamar, you can choose which backup sets are replicated and if you want to replicate your daily, weekly, monthly, and/or annual backups. Replication with Avamar is scheduled as part of the backup job and happens once the backup completes.

In the Data Domain world, backups can be sent to different folders and then you choose which folders to replicate. It does not natively allow for granular replication of just your monthly or weekly backups. However, you could tell your backup software to put your monthly backups in a different folder and replicate only that folder.

Those are two more distinct differences that weren't mentioned in my original post. Thanks for bringing them up.
11/10/09 @ 19:57
Comment from: Jedidiah Yueh [Visitor] · http://www.delphix.com
Justin--great summary of the differences. I'm the founder of Avamar, and it's always eye opening to find people who understand the technology well enough to communicate competitive differentiation.

A quick note on databases in the comments section: variable segmentation doesn't help for databases. Databases store data in blocks inside files. Those blocks are fixed-sized blocks (often 8K), and the blocks have structure that resists de-duplication (unique IDs, transaction record log information, integrity check information, actual user data, etc.). These "structural wrappers" decrease the efficiency of de-duplication solutions; while you will still see day-over-day (e.g. snapshot) and compression benefits, you will rarely see global de-duplication benefits. As a result, databases tend to pig up storage in de-duplication solutions and create capacity management challenges (you need global de-duplication benefits to outweigh the overhead for de-duplication).

While today's solutions will work OK, databases are unique applications that really need unique solutions.
11/16/09 @ 18:47

This post has 58 feedbacks awaiting moderation...

Leave a comment


Your email address will not be revealed on this site.

Your URL will be displayed.
(Line breaks become <br />)
(Name, email & website)
(Allow users to contact you through a message form (your email will not be revealed.)