• IDS Blogs
  • Charles
  • Justin
  • David
  • Jon
  • Matt
  • Josh
  • Karsten

Justin's Corner

Justin gets into the thick of things

    IDS Main Page
  • Contact
  • Log in

Data Domain vs. EMC Avamar: Which deduplication technology is better?

November 9th, 2009

Data Domain vs. Avamar - Which is better?

Now that EMC owns both Data Domain and Avamar, I am constantly being asked which technology is better. Before the Data Domain acquisition, it was tough to get a straight answer because the two deduplication giants were constantly slugging it out and slandering each other to try and find an edge and gain more market share. With the two technologies now living under the same umbrella, sometimes it is hard to tell where one technology ends and the other begins.

Both Avamar and Data Domain have their pros and cons and the niches where they fit best, as well as places where they shouldn’t be deployed. If you’re reading this post, you’re probably trying to figure out which technology would be the best for you. In an effort to try and help sort the sales fluff from the truth, I have tried to summarize the similarities and differences between the two products so you can figure out which one best fits your environment.

First off, the two products share some common attributes, so let’s look at those first.

Block-based deduplication

Rather than just deduplicating full files that are exactly the same, Avamar and Data Domain will break files apart into small blocks and compare those blocks to ones that have already been backed up. Each unique block only needs to be backed up once within your environment. So let’s say you have a 100MB PowerPoint presentation that has your name on the front slide and you send it to 9 other people and each person makes a small change and puts their name on the front slide. Traditional backup technologies will see each one as a new file and backup 10 copies of the file, each 100MB in size. Avamar and Data Domain will break the file apart into blocks and see that the only a small portion has changed and backup just the changes.

Variable length deduplication

Both Avamar and Data Domain utilize variable length deduplication rather than fixed length de-dupe. What does that mean in Layman’s terms? Fixed-length deduplication always looks for segments of the same size when looking for common data. So if you use a 128K fixed block, it will look at the first 128K of the file, then the second 128K of the file, and so on looking for common data. Variable-length deduplication takes a more intelligent approach and can vary the size of the segment when it is looking for commonality. This means if small changes are inserted into the middle of a file, it is smart enough to pick out just those changes and still see the common data around them. When changes are inserted into the middle of a file with fixed-length deduplication, the data will typically shift and the remainder of the file can often be seen as all new data. Avamar and Data Domain both utilize variable-length deduplication and that is why you will typically see much higher commonality rates than most of the competition.

That is where the similarities end and the differences begin. Here are some key areas where the two products differ:

Where does the deduplication happen?

Data Domain utilizes target-based deduplication. The Data Domain appliance is simply a disk target that you point your backup software at. Backups leave the server in their full format and are deduplicated on the fly as they hit the Data Domain appliance. The data flowing out of the server and across the network is not reduced, but the amount of data stored on disk is reduced significantly.

On the other side, Avamar utilizes source-based deduplication. Since Avamar is both the backup software and backup-to-disk target, it can actually deduplicate the data before it leaves the server. This means that files are broken apart and deduplicated before any backup data is sent across the network. Only the changed blocks are sent across the network to the backup-to-disk target. This results in a reduction in network traffic, the amount of data stored on disk, and also the time it takes you to backup.

What is included?

Data Domain was designed to very simply integrate into any existing backup environment with very little effort. You still utilize your existing backup software and just point to the Data Domain appliance as a backup target.

Avamar is a rip-and-replace for your current backup environment as it includes both backup software and a B2D appliance. You will remove your current backup agent from your servers and load the Avamar agent in its place. This agent is how Avamar is able to deduplicate at the server level.

How do you expand?

Data Domain comes as an appliance with disk built-in. Depending on the model, you can expand by adding drives until you hit the maximum amount for that model. When you reach the maximum capacity for the model, you must purchase a new unit to upgrade. A Data Domain gateway product is also available which allows you to use your existing storage behind it.

Avamar was designed as a node-based grid solution. Each node has a specific capacity and if you need more backup storage, you add more nodes to your grid. Data is striped within the nodes and also striped across the nodes for additional protection. This means your data is safer, but there is also a parity overhead to be aware of.

Do you really need tape?

Both Avamar and Data Domain allow you to backup to an appliance and then replicate the data offsite to a sister appliance. Now that you have your backups geographically dispersed, do you really need tape? Most people will say no because tape becomes more of a liability and security risk than anything if you already have your backups offsite. However, sometimes the strictest of compliance departments will absolutely require tape even if the backups are stored offsite on disk. If that is the case, you have some decisions to make.

Data Domain doesn’t have any native tape-out functionality, but that is by design. Since you’re using your existing backup software to push data to the Data Domain, you would use that software to push backups off to tape. It is as simple as doing a copy job and copying your backup set to another media.

Avamar was originally designed as a completely tapeless solution. The backup data is pushed to the Avamar appliance and then replicated offsite. However, as Avamar became more popular and people with tape requirements wanted to jump on the bandwagon, EMC began developing a “tape-out” functionality. The first release of Avamar tape-out was basically a script to do a rolling restoration of your backups to a proxy server and then use another backup software to move that restored backup to tape. Not the prettiest scenario in the world, but if you wanted to move your backups to tape monthly, it was serviceable. In the next release of Avamar, the tape-out functionality is being totally re-written and looks much more promising. Stay tuned for more information on this as it becomes available.

So when you’re evaluating which deduplication technology is the best for you, make sure to consider your need for tape in the decision. Tape with Data Domain is much simpler, but Avamar is working to get there too.

There are many other similarities and differences between the two products, but those are some of the heavy hitters. So let’s take a few specific use-cases and see which technology fits the best:

I need to decrease my backup time – If your backups are quickly growing out of your backup window, you need to find a way to backup more data in less time. Data Domain could speed up your process since backing up to disk is faster than tape, however you are still sending your backup data in its full format, so that change will likely be minor. Avamar will deduplicate at the server and only send the changed blocks across the network and this will drastically decrease your backup time.

Advantage: Avamar

I love my existing backup software and just want to decrease my backup footprint – If your existing backup software is working great and you’re just looking to do backup-to-disk with deduplication, Data Domain is the perfect solution for you. Simply plug in the Data Domain appliance, point your backup software to it, and your backups will be deduplicated as they hit the appliance. You can also replicate offsite if needed. Avamar requires removal of your existing backup software and isn’t a great fit for this scenario.

Advantage: Data Domain

I need a better way to backup my remote offices – If you have data at your remote offices and you want to centralize your backups without having huge WAN links, deduplication is a must. Data Domain can help by putting a small appliance at each site and replicating into a large appliance to centralize the backups. Avamar can take it one step further because it deduplicates data before sending across the wire. Because of this, in some situations, you can just put agents at the remote sites and send only the changed blocks across the WAN to your centralized site. This allows you to avoid putting backup hardware at remote sites.

Advantage: Avamar

My Compliance Department requires that I use tape on a weekly basis – If replicating your backups offsite isn’t enough and you must use tape on a regular basis, Data Domain will easily allow you to push to tape by utilizing your existing backup software. Avamar has methods to push to tape that were mentioned above, but they aren’t designed for frequent use.

Advantage: Data domain

So what does all of this mean? The moral of the story is that there isn’t a one-size-fits-all backup solution in the market right now. You really need to know exactly what you want to accomplish and compare that with the product feature sets and determine which the best fit is. The key is understanding what each product has to offer and how that fits into your environment.

Tags: acquisition, backup, backup agent, backup target, best, better, block based, commonality, compliance, cons, data domain, data duplication, deduplication, differences, disk, emc avamar, fixed length, functionality, load, network, node, offsite, pros, replication, server, similarities, storage, tape, target based, technology, variable length

Posted in Uncategorized | 5 feedbacks »

Will your data benefit from deduplication? Find out by testing dedupe rates with EMC CAT tool download.

July 7th, 2009

Is deduplication really all it’s cracked up to be?

With everyone in the industry talking about deduplication, you can’t go 2 minutes without hearing how great it is and outlandish claims regarding deduplication rates. So the question is… is dedupe really all it’s cracked up to be? The answer isn’t really in the deduplication technology itself. It’s actually in the make-up of the data you’re looking to deduplicate. So how do you know if dedupe is the right technology for you?

Do you have a ton of highly-compressed images or multimedia files? These aren’t the ideal data types for deduplication.

Does your environment contain a lot of large databases like SQL, Exchange, Oracle, and Exchange? If that is the case, dedupe can help, but not as much as those crazy marketing numbers say.

Do you have large File Servers? Lots of VMware? Remote offices you need to backup? This is where dedupe really shines and you can really add some efficiencies in your environment. It is also where those numbers like 200:1 or 500:1 come from and can actually be beat in some cases.

Now of course your data doesn’t nicely fit into just one of those categories above. It likely spans two of them, if not all three. So we’re back to the original question…is dedupe the right technology for you? One of the keys to deploying an *effective* deduplication solution is to know where to deploy it, how to deploy it, and what to expect.

Just because you can deduplicate your live VMware or database environment doesn’t necessarily mean you should. There are a lot of implications to trying to deduplicate data that is frequently accessed and performance can severely suffer in some cases. While dedupe is a great technology, it can bring your environment to its knees if implemented incorrectly. I’ll address this in another post because that is a whole different topic. Today I want to focus on how to figure out if, and how, deduplication can benefit your business in the backup process.

Knowing where to deploy deduplication and what rates to expect can really only be determined through an assessment of your current environment. EMC has a great tool called the Commonality Assessment Tool (CAT) that will allow you to look at a subset of your data and see exactly what the commonality is. This tool can be downloaded from the IDS website for free here (click the link for the EMC CAT tool download).

So what is this tool and what can you expect from it? EMC offers a deduplication solution called Avamar, which is a backup software and backup-to-disk appliance all wrapped into one. The CAT Tool is essentially a modified Avamar client that will perform a simulated backup on your server(s) and instead of actually backing the data up, it just tracks what the deduplication rate is and how long the actual backup would have taken. I’ll quickly take you through the process of running this tool and show you how easy it is to figure out exactly how much commonality is in your data.

***One important thing to note before beginning is that the CAT Tool has the same impact on your system as a normal backup client. It is recommended to run it off-hours and not during your regular backup window.

  1. Download the CAT Tool from the IDS website (click for the deduplication rate test tool download). A link will be emailed to you where you can download a zip file containing the tool.
  2. Extract the zip file to C:\Avasst. The directory will contain avtar.exe and avasst.exe. *Note: This directory must exist or the tool will not run correctly.
  3. Open a command prompt by going to Start/Run and running cmd.exe.
  4. Browse to the CAT directory by typing “cd\Avasst”
  5. Run the CAT tool by typing “avasst”
  6. You will be prompted to select a folder to scan. In this example, I will scan the D Drive by entering “d:”. If you want to scan multiple folders at once, see the notes at the end.
  7. The first time you run the tool, you can expect the tool to take approximately 1 hour per 100GB of file and 1 hour per million files. However, subsequent runs will be much quicker due to deduplication.
  8. When the tool has completed running, you will see the following screen:
  9. Now if you look in the c:\Avasst folder, you will see several files that are tracking the deduplication rates and backup times for your data. They are just raw data and need to be run through a tool in order to interpret the results.
  10. In order to see the full benefits of deduplication, you will want to run this tool against the same dataset several times (at least 3). You can also run it across several different datasets to see commonality across several servers. Since the commonality tracking is stored locally in the c:\Avasst folder, you will want to mount directories from other servers and scan them from this server across the network.
  11. When you have scanned your datasets, zip the results and send them to your IDS Engineer to have the results interpreted.

Some other notes on the CAT Tool:

  • If you want to scan data from other servers, you can simply mount another server’s drive to a drive letter on the local server and scan that drive.
  • In a real deduplication solution, all data will be deduplicated globally against other servers and backup sets. Since the assessment tool tracks deduplication locally, you will need to scan all datasets from the same server to see global deduplication benefits.
  • The CAT Tool can easily be scheduled using the built-in Windows Scheduler. Those instructions are included in a Word document included with the CAT Tool download.
  • If you want to scan multiple folders at once, you will need to create a silent file that contains the folders you want to scan. Simply create a file named “Silent” with no extension in the c:\Avasst folder. Inside of that file, just put a line for each drive you want to scan (see screenshot below)

**Note that you cannot end any entries with a backslash. For example, C: and D:\Users are valid, but C:\ and D:\Users\ are invalid.

Tags: avamar, backup, cat, commonality, compression, data, database, dedupe, deduplication, download, emc, exchange, guide, how to, installation, oracle, rates, ratio, review, software, sql, storage, technology, test, vmware

Posted in Uncategorized | 2 feedbacks »

Demo Center for EMC and VMware technologies enables customers to test virtualization and storage solutions.

May 2nd, 2009

What is an EMC/VMware Center of Excellence?

For several years, EMC and VMware have been working hand-in-hand to provide customers with solutions to drive costs out of the Data Center and provide the highest levels of performance and availability. Taking the next step to demonstrate their solutions for the virtual Data Center, EMC and VMware have teamed up with strategic Integrators to create eight Centers of Excellence across the United States. Integrated Data Storage is proud to be chosen as one of these Centers of Excellence and the first in the Midwest.

The IDS Center of Excellence is comprised of a Demo Center which showcases EMC and VMware’s leading technology solutions, and a staff of highly trained Engineers with field experience in the integrated solution offerings. The combination of VMware’s virtualization technology, EMC’s information infrastructure solutions, and IDS’s expertise in architecture and implementation create a unique opportunity for customers to see working implementations of the solutions they can use to increase the return on technology investments.

The Demo Center contains a full lab of EMC and VMware technologies including:

* VMware Virtual Infrastructure with VMotion, DRS, and VM HA
* EMC Celerra Unified Storage platform
* EMC Celerra Virtual Storage Appliance (VSA)
* EMC Replication Manager for application integrated snaps and clones
* Ontrack Power Controls for Exchange single-message restore
* VMware backup with EMC Avamar for source-based de-duplicated backups
* VMware backup with EMC Networker and Disk Library 3D 1500 for target-based de-duplicated backups
* Disaster Recovery using EMC Celerra Replicator and VMware Site Recovery Manager
* Virtual Desktop Infrastructure with VMware View
* EMC Storage Viewer plug-in to view your EMC Storage through vCenter

With all of these technologies on display, customers can come in and witness how the solutions work in-person. Customers evaluating solutions for their virtual Data Center are encouraged to come in and test drive the technologies with the help of an IDS Engineer to see the EMC and VMware integration first-hand.

Posted in Uncategorized | 1 feedback »

  • July 2010
    Sun Mon Tue Wed Thu Fri Sat
     << <   > >>
            1 2 3
    4 5 6 7 8 9 10
    11 12 13 14 15 16 17
    18 19 20 21 22 23 24
    25 26 27 28 29 30 31
  • Justin's Corner

    • Recently
    • Archives
    • Categories
    • Latest comments
  • Search

  • Categories

    • All
    • Uncategorized
  • XML Feeds

    • RSS 2.0: Posts, Comments
    • Atom: Posts, Comments
    What is RSS?
powered by free blog software

©2010 by Justin Mescher | Contact | Design by Michael | Credits: blog software | web hosting | monetize