EMC Avamar Replication Guide: Sizing Your Bandwidth Properly for Offsite Backup
November 5th, 2009Avamar: Tape Backup Alternative
One of the most powerful features of Avamar software is its ability to replicate backup sets to another Avamar grid, providing offsite backups without the hassle of cutting and handling tapes. Avamar’s deduplication functionality not only reduces the amount of storage required but also the bandwidth required to replicate what is functionally the equivalent of a full backup. While the reduction in bandwidth is dramatic, the difference bytes still have to be sent across a WAN link. Sizing that bandwidth properly is critical to the effective operation of the Avamar System.
Calculate Change Rate with Script
The first thing you need to determine is your daily change rate. The easiest way to determine this is to utilize the capacity.sh script. This script is included in an optional bundle of scripts that can be obtained from Avamar support or your IDS engineer. The output of this script provides information about the total bytes added, expired and daily net change for the last 30 days. It also provides averages of these values over the same period. The average bytes added value is the value that you need to calculate the needed bandwidth.
Backup/Replication Best Practices
Avamar recommends that replication run from 10pm to 6am. This ensures that backup sets earlier in the backup window can be replicated the same day. The challenge is that replication is only performed on complete backup sets, so if you have a backup that runs for several hours after 10pm replication of that backup set will run the following day. The other consideration is that Avamar recommends that replication be completed before daily maintenance operations run, and that backup and replication operations not run during the daily maintenance operations. Avamar’s best practices guide recommends that you size your replication bandwidth to complete within 4 hours for typical changes providing a 4 hour buffer for spikes. This is an important consideration, so you don’t fall behind.
Bandwidth Sizing Example
To create a scenario, let's walk through a bandwidth-sizing example. Running the capacity.sh script, it is determined that there is a daily average addition of 50GB. In order to replicate this change rate reliably the goal is to have this replicated in 4 hours. This requires 12.5GB/Hr or 3.5MB/sec. This needs to be converted to bits so multiply by 8. 3.5MB/s * 8 = 28Mb/s. Latency and protocol overhead needs to be accounted for—20 to 25% overhead is a good rule of thumb. 28Mb/s + 20% = 34Mb/s. For this example a T3 would provide sufficient bandwidth to replicate the daily average addition of 50GB.
EMC Avamar Review from the Field: Backup Made Better with Source-Based Deduplication
July 15th, 2009Avamar Deduplication: Notes from the Field
Companies and organizations are often faced with challenges regarding their backup windows, making them concerned about their overall reliability. EMC Avamar is a powerful backup and recovery system that can provide your organization with the reliability of daily full backups in a fraction of the time required by a traditional tape and disk based backup solution.
How is this accomplished? Through the use of source-based deduplication technology. This post provides a couple of recent real-world examples of how powerful this product is, and how it makes dramatic reduction in backup windows possible.
The first example is with backing up Microsoft Exchange. This backup agent is provided at no charge with the purchase of Avamar software. (As a matter of fact, all supported clients and application backup agents are included at no charge, but I digress...) Avamar's exchange agent can backup the Information Store as well as individual mailboxes. The brick-level mailbox backup is the focus of this example.
A Midwest-based customer recently purchased Avamar from IDS and engaged IDS to implement the backup solution for their two data centers. The first backup pass of the Exchange Brick-Level backup took 20 hours to complete. The information store was approximately 80GB with 120 users and 500,000 total objects. The last successful backup of the Exchange mailboxes previously took 13 hours to complete with a popular departmental backup product. The second pass of this backup, scheduled 24 hours later, took 2 hours and had a commonality ratio of 99.7% (meaning the second full backup consumed only 225MB of disk space on the Avamar grid).
The second example is where Avamar really shines. A municipal goverment technology office purchased replicating Avamar Grids and engaged IDS to implement the backup solution. This organization had a file server with 650GB of mixed office files and application data on one of their servers. The initial backup of this server took 23 hours and reduced the data required to store the full backup by 65% on the initial pass. The second full backup of this server took less than 2 hours and required only .4%, or 2.6GB, of storage space on the Avamar Grid. This technology and the ability of the system to replicate their data off-site to a secondary data center will allow the customer to recover their investment by eliminating their current costly off-site backup service, while providing faster, more reliable restores.
Stupid (But Useful) Avamar Tricks: Are you having challenges sorting your activity monitor screen by start or end time? Try Shift+Click on the heading you want for descending sort.
Newly found features: Avamar 4.1 SP2 (v4.1.33) was released on June 30th; here are a couple of interesting tid-bits I have recently found. Client based overrides: you can on a client-basis override the encryption setting, and/or dataset (this is particularly useful if you want to schedule a server with special excludes or file lists with the rest of the servers in a group).
Five Key Elements to Good Data Storage Documentation
May 12th, 2009Having good data storage documentation makes you and your team more effective and efficient. Most importantly, though, good storage documentation puts all the information you need at your fingertips in the event when something goes wrong (which, at some point, it will). What does good documentation look like, then? Data storage documentation—whether you create it yourself or your vendor provides it—should possess five key elements.
1. Ideally, your documentation should start with a descriptive overview of your environment. This is important for new team members, managers and consultants, allowing them to quickly familiarize themselves with your environment and what you are trying to accomplish with your infrastructure. From there we get into the meat of the documentation.
2. The document that you will probably reference the most is the connectivity map. This document, often a MS Visio diagram, should visually describe how each of the devices (servers, storage, switches, tape/virtual tape, etc.) in your storage environment are connected. These diagrams can be accurate down to the ports on the fibre channel switches and individual devices. It is often useful to include a chart on this diagram that has management interface IP addresses and device WWPNs. An additional diagram or chart included on this or near it is a storage layout document that provides a physical perspective on the disk groups. Maintaining the accuracy of these documents is a key factor in being able to plan, troubleshoot and maintain your storage environment.
3. The configuration detail section should have all the nitty-gritty details for each storage, connectivity and host device in your SAN environment. Much of this can be gathered with manufacturer provided tools and then compiled; however, key elements to record are management interface information (IP, user(s), passwords), driver versions, firmware versions, and software versions. Other items to record include WWPN addresses and switch port connections.
4. Regardless of the size of your environment, you should maintain a change-log. At a minimum, record when a change was made, who made it, and what was done. This will prove invaluable when something breaks or something is "fixed." The change-log can provide critical insight during failure analysis or when troubleshooting performance problems. Procedural guides provide a quick way to refresh your memory or assist new team members with tasks that are not frequently performed. Whether it is configuring a new server on the SAN, setting up a new replication consistency group or adding drives to your NAS, documenting the steps and using that document as a checklist provides consistent, repeatable results.
5. The most critical part of your documentation is the support information. Having the manufacturer phone numbers, site IDs, and device serial numbers (from the configuration detail) at your fingertips will shave critical time off of problem resolution. It is also important to have your integrator's contact information in this section, as they can serve as a liaison with the manufacturer to escalate cases when necessary.